期刊
APPLIED SOFT COMPUTING
卷 130, 期 -, 页码 -出版社
ELSEVIER
DOI: 10.1016/j.asoc.2022.109704
关键词
Probabilistic distance clustering; Mixed-type data; Fuzzy clustering
资金
- San Jose State University Mathematics and Statistics department [3415040090]
- Central RSCA of San Jose State University [18-RSG-08-046]
This paper discusses a probabilistic distance clustering method adjusted for cluster size (PDQ) for handling mixed-type data, shows its advantages through a simulation design, and applies it to a real data set.
Cluster analysis is a broadly used unsupervised data analysis technique for finding groups of homoge-neous units in a data set. Probabilistic distance clustering adjusted for cluster size (PDQ), discussed in this contribution, falls within the broad category of clustering methods initially developed to deal with continuous data; it has the advantage of fuzzy membership and robustness. However, a common issue in clustering deals with treating mixed-type data: continuous and categorical, which are among the most common types of data. This paper extends PDQ for mixed-type data using different dissimilarities for different kinds of variables. At first, the PDQ for mixed-type data is defined, then a simulation design shows its advantages compared to some state of the art techniques, and ultimately, it is used on a real data set. The conclusion includes some future developments.(c) 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据