Journal
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
Volume 51, Issue 2, Pages 875-884Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSMC.2018.2884839
Keywords
Connectedness index; density-based clustering; eigenpath
Funding
- Natural Science Foundation of China [61471216, 61771276]
- National Key Research and Development Program of China [2016YFB0101001]
- Special Foundation for the Development of Strategic Emerging Industries of Shenzhen [JCYJ20170307153940960, JCYJ20170817161845824]
Ask authors/readers for more resources
The paper introduces a one-dimensional analysis method for modeling high dimensional clustering problems as probability distributions, utilizing eigenpaths and connectedness indices to describe connections between vertices, drawing indicative curves to identify cluster forms, and partly eliminating the curse of dimensionality.
Data clustering is one of the most fundamental techniques in exploratory data analysis. It is widely used for determining the underlying data structure, classifying natural data and compressing data in engineering, business management, social statistics, computer science, and medicine. Under the assumption that clusters are high density regions in the feature space separated by relatively low density neighbors, a novel approach is proposed for modeling any high dimensional clustering problem as a one-dimensional analysis of the probability distribution. First, a special path between two vertexes, namely eigenpath, is defined in this paper to represent their close connection. Second, we propose the connectedness index based on the eigenpath for quantitatively describing the connection between two vertexes. Third, the connectedness index is applied to the candidates of cluster centers and measures the connection between different candidates. Then an indicative curve can be drawn with the knowledge of connectedness index. This approach not only provides effective indicative curve for unknown data sets but also facilitates eliminating the curse of dimensionality partly as well as correctly recognizes arbitrary cluster forms and automatically excludes outliers. Extensive experiments showed the effectiveness and efficiency of the proposed approach.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available