4.7 Article

A generalized multi-aspect distance metric for mixed-type data clustering

Journal

PATTERN RECOGNITION
Volume 138, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2023.109353

Keywords

Clustering; Mixed data; Ordinal and nominal attribute; Inter -dependency; Intra-attribute information; Mutual information

Ask authors/readers for more resources

This study proposed a new measure of distance for a mixed-type data set, which considers the inter-attribute and intra-attribute information depending on the type of attributes. The proposed method utilizes entropy, Jensen-Shannon divergence, and a modified version of Mahalanobis distance. A unified framework based on mutual information is also introduced to control attributes' contribution to distance measurement. Extensive evaluation on benchmark data sets demonstrates the efficacy of the proposed method.
Distance calculation is straightforward when working with pure categorical or pure numerical data sets. Defining a unified distance to improve the clustering performance for a mixed data set composed of nom-inal, ordinal, and numerical attributes is very challenging due to the attributes' different natures. In this study, we proposed a new measure of distance for a mixed-type data set that regards inter-attribute in-formation and intra-attribute information depending on the type of attributes. In this regard, entropy and Jensen-Shannon divergence concepts were used to exploit the inter-attribute information of categorical -categorical and categorical-numerical attributes, respectively. Also, a modified version of Mahalanobis dis-tance was proposed to consider the intra-and inter-attribute information of numerical attributes. We also introduced a unified framework based on mutual information to control attributes' contribution to dis-tance measurement. The proposed distance in conjunction with spectral clustering was extensively eval-uated concerning various categorical, numerical, and mixed-type benchmark data sets, and the results demonstrated the efficacy of the proposed method.(c) 2023 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available