4.7 Article

Accurate Estimation of the Intrinsic Dimension Using Graph Distances: Unraveling the Geometric Complexity of Datasets

期刊

SCIENTIFIC REPORTS
卷 6, 期 -, 页码 -

出版社

NATURE PUBLISHING GROUP
DOI: 10.1038/srep31377

关键词

-

资金

  1. National Science Foundation [ACI-1614804]
  2. National Institutes of Health, National Institute for General Medical Sciences [P01-GM55876]

向作者/读者索取更多资源

The collective behavior of a large number of degrees of freedom can be often described by a handful of variables. This observation justifies the use of dimensionality reduction approaches to model complex systems and motivates the search for a small set of relevant collective variables. Here, we analyze this issue by focusing on the optimal number of variable needed to capture the salient features of a generic dataset and develop a novel estimator for the intrinsic dimension (ID). By approximating geodesics with minimum distance paths on a graph, we analyze the distribution of pairwise distances around the maximum and exploit its dependency on the dimensionality to obtain an ID estimate. We show that the estimator does not depend on the shape of the intrinsic manifold and is highly accurate, even for exceedingly small sample sizes. We apply the method to several relevant datasets from image recognition databases and protein multiple sequence alignments and discuss possible interpretations for the estimated dimension in light of the correlations among input variables and of the information content of the dataset.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据