4.7 Article

Random subspace and random projection nearest neighbor ensembles for high dimensional data

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 191, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2021.116078

Keywords

Nearest neighbor ensemble; High dimensional data; Random subspace method; Random projection method

Ask authors/readers for more resources

The random subspace and random projection methods were investigated for forming ensembles of nearest neighbor classifiers in high dimensional feature spaces, with results showing improvements in predictive performance compared to standard nearest neighbor classifiers. The choice between the two methods depends on the type of data, with random projection outperforming random subspace for microarray and chemoinformatics datasets, while the opposite is true for image datasets. Additionally, the resulting ensembles using random projection perform on par with random forests for microarray and chemoinformatics datasets.
The random subspace and the random projection methods are investigated and compared as techniques for forming ensembles of nearest neighbor classifiers in high dimensional feature spaces. The two methods have been empirically evaluated on three types of high-dimensional datasets: microarrays, chemoinformatics, and images. Experimental results on 34 datasets show that both the random subspace and the random projection method lead to improvements in predictive performance compared to using the standard nearest neighbor classifier, while the best method to use depends on the type of data considered; for the microarray and chemoinformatics datasets, random projection outperforms the random subspace method, while the opposite holds for the image datasets. An analysis using data complexity measures, such as attribute to instance ratio and Fisher's discriminant ratio, provide some more detailed indications on what relative performance can be expected for specific datasets. The results also indicate that the resulting ensembles may be competitive with state-of-the-art ensemble classifiers; the nearest neighbor ensembles using random projection perform on par with random forests for the microarray and chemoinformatics datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available