期刊
IEEE ACCESS
卷 10, 期 -, 页码 80858-80868出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3195488
关键词
k-nearest neighbor search; random projection trees; random projection forests; unsupervised learning
Partitioning trees are efficient for k-nearest neighbor search, but kd-trees can be ineffective in high dimensions. Random projection trees (rpTrees) solve this problem and are influenced by point dispersion and the number of rpTrees in an rpForest.
Partitioning trees are efficient data structures for k-nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called kd-trees to perform k-nn search. Unfortunately, kd-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error. Random projection trees rpTrees solve this scalability problem by using random directions to split the data. A collection of rpTrees is called rpForest. k-nn search in an rpForest is influenced by two factors: 1) the dispersion of points along the random direction and 2) the number of rpTrees in the rpForest. In this study, we investigate how these two factors affect the k-nn search with varying k values and different datasets. We found that with larger number of trees, the dispersion of points has a very limited effect on the k-nn search. One should use the original rpTree algorithm by picking a random direction regardless of the dispersion of points.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据