4.7 Article

Uncertainty clustering internal validity assessment using Frechet distance for unsupervised learning

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.engappai.2023.106635

关键词

Unsupervised learning; Clustering validity; Frechet distance; Type-2 fuzzy sets

向作者/读者索取更多资源

Knowing the number of clusters in unsupervised learning is difficult, and current clustering internal validity indices (CIVIs) don't work well in all scenarios. To address this problem, a new Uncertainty Frechet (UF) CIVI is proposed, which assesses the certainty of a partition using uncertainty fingerprints and the Frechet distance between clusters. UF is integrated into a merging methodology that combines similar clusters, eliminating the need for iterative clustering algorithms. Extensive evaluations on synthetic and real datasets demonstrate the effectiveness of UF, obtaining high scores in different scenarios. The UF index is proven to be a valuable tool for researchers and practitioners working with highly uncertain data.
Knowing the number of clusters a priori is one of the most challenging aspects of unsupervised learning. Clustering Internal Validity Indices (CIVIs) evaluate partitions in unsupervised algorithms based on metrics like compactness, separation, and density. However, specialized CIVIs for specific applications have been designed, and there is no general CIVI that works in all scenarios. The absence of CIVIs based on crisp uncertainty metrics is especially critical in decision-making processes that involve ambiguity, non-convex distributions, outliers, and overlapping data. To address this problem, we propose a novel Uncertainty Frechet (UF) CIVI that assesses the certainty of a well-defined partition. UF leverages uncertainty fingerprints based on Type-2 fuzzy Gaussian Mixture Models (T2FGMM) and the Frechet distance between clusters to introduce a metric that evaluates partition quality. We integrate UF into a merging methodology that combines similar clusters within a partition, allowing us to determine the number of clusters without the need to run the clustering algorithms iteratively as other CIVIs require. We undertake a comprehensive evaluation of our proposal on 5,250 convex, 36 non-convex synthetic datasets, and five benchmark real datasets. In addition, we apply UF in a real-world scenario that involves high uncertainty: Passive Acoustic Monitoring (PAM) of ecosystems, which aims to study ecological transformations through acoustic recordings. The results show that UF exhibits notable performance in synthetic and real-world scenarios, obtaining an Adjusted Mutual Information (AMI) score higher than 0.88 for normal, uniform, gamma, and triangular distribution datasets. In the PAM application, UF identifies the transformation of ecosystems through sound using clustering algorithms and UF, achieving an F1 score of 0.84. Therefore, results show that the UF index is a suitable tool for researchers and practitioners working with highly uncertain data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据