4.7 Article

Intrinsic dimension estimation method based on correlation dimension and kNN method

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 235, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2021.107627

Keywords

Intrinsic dimension; Order statistics; Estimation method; Correlation dimension; k-Nearest Neighbor (kNN)

Funding

  1. National Natural Science Founda-tion of China [61573266]
  2. University Natural Science Re-search Key Projects of Anhui Province, China [KJ2019A0816]
  3. Fundamental Research Funds for the Central Universities, China [YJS2107]
  4. Natural Science Basic Research Program of Shaanxi, China [2021JM-133]
  5. General Project of Natural Science of Anhui Science and Technology University, China [2021zryb25]

Ask authors/readers for more resources

In practical problems, high-dimensional data often exhibits low-dimensional structure, which can be estimated using correlation dimension methods. However, these methods tend to underestimate the true intrinsic dimension of the dataset.
In practical problems, high-dimensional data usually has a low-dimensional structure, or the data is located on a low-dimensional manifold. The dimension of this manifold is called the intrinsic dimension of the data. There are many intrinsic dimension estimation methods, among which methods based on the correlation dimension have received extensive attention. However, correlation dimension based estimation methods often provide a dimension lower than the true intrinsic dimension of the dataset. To explore the reasons behind underestimation, the probabilities of underestimation, overestimation and proper estimation are analyzed using order statistics. The analysis results show that the probability of underestimation is much higher than that of the other two cases, and is verified by simulation experiments. Based on the above analysis, a new method for the estimation of the intrinsic dimension is proposed based on the correlation dimension and k-nearest neighbor method (kNN), which effectively reduces the underestimation. This method is implemented using two algorithms, namely a search algorithm and a matching algorithm. Comprehensive experimental studies on simulation datasets and real datasets show that the proposed algorithms are more effective than the comparison methods. (C) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available