4.7 Article

Clustering mixed-type data using a probabilistic distance algorithm

期刊

APPLIED SOFT COMPUTING
卷 130, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.asoc.2022.109704

关键词

Probabilistic distance clustering; Mixed-type data; Fuzzy clustering

资金

  1. San Jose State University Mathematics and Statistics department [3415040090]
  2. Central RSCA of San Jose State University [18-RSG-08-046]

向作者/读者索取更多资源

This paper discusses a probabilistic distance clustering method adjusted for cluster size (PDQ) for handling mixed-type data, shows its advantages through a simulation design, and applies it to a real data set.
Cluster analysis is a broadly used unsupervised data analysis technique for finding groups of homoge-neous units in a data set. Probabilistic distance clustering adjusted for cluster size (PDQ), discussed in this contribution, falls within the broad category of clustering methods initially developed to deal with continuous data; it has the advantage of fuzzy membership and robustness. However, a common issue in clustering deals with treating mixed-type data: continuous and categorical, which are among the most common types of data. This paper extends PDQ for mixed-type data using different dissimilarities for different kinds of variables. At first, the PDQ for mixed-type data is defined, then a simulation design shows its advantages compared to some state of the art techniques, and ultimately, it is used on a real data set. The conclusion includes some future developments.(c) 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据