4.7 Article

A GROUP FINDING ALGORITHM FOR MULTIDIMENSIONAL DATA SETS

期刊

ASTROPHYSICAL JOURNAL
卷 703, 期 1, 页码 1061-1077

出版社

IOP PUBLISHING LTD
DOI: 10.1088/0004-637X/703/1/1061

关键词

galaxies: halos; galaxies: structure; methods: data analysis; methods: numerical

向作者/读者索取更多资源

We describe a density-based hierarchical group finding algorithm capable of identifying structures and substructures of any shape and density in multidimensional data sets where each dimension can be a numeric attribute with arbitrary measurement scale. This has applications in a wide variety of fields from finding structures in galaxy redshift surveys, to identifying halos and subhalos in N-body simulations and group finding in Local Group chemodynamical data sets. In general, clustering schemes require an a priori definition of a metric (a non-negative function that gives the distance between two points in a space) and the quality of clustering depends upon this choice. The general practice is to use a constant global metric which is optimal only if the clusters in the data are self-similar. For complex data configurations even the most finely tuned constant global metric turns out to be suboptimal. Moreover, the correct choice of metric also becomes increasingly important as the number of dimensions increase. To address these problems, we present an entropy-based binary space partitioning algorithm which uses a locally adaptive metric for each data point. The metric is employed to calculate the density at each point and a list of its nearest neighbors, and this information is then used to form a hierarchy of groups. Finally, the ratio of maximum to minimum density of points in a group is used to estimate the significance of the groups. Setting a threshold on this significance can effectively screen out groups arising due to Poisson noise and helps organize the groups into meaningful clusters. For a data set of N points, the algorithm requires only O(N) space and O(N(logN)(3)) time which makes it ideally suitable for analyzing large data sets. As an example, we apply the algorithm to identify structures in a simulated stellar halo using the full six-dimensional phase space coordinates.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据