4.7 Article

A categorical data clustering framework on graph representation

Journal

PATTERN RECOGNITION
Volume 128, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.108694

Keywords

Cluster analysis; Categorical data clustering; Data representation; Graph embedding

Funding

  1. National Key Research and Development Program of China [2020AAA0106100]
  2. National Natural Science Foun-dation of China [62022052]
  3. Technology Research Develop-ment Projects of Shanxi [201901D211192]

Ask authors/readers for more resources

This paper introduces a graph-based framework for clustering categorical data. The proposed method learns the representation of categorical values from their similar graph to provide similar representations for similar categorical values. Experimental results demonstrate the effectiveness of the framework compared to other methods.
Clustering categorical data is an important task of machine learning, since the type of data widely exists in real world. However, the lack of an inherent order on the domains of categorical features prevents most of classical clustering algorithms from being directly applied for the type of data. Therefore, it is very key issue to learn an appropriate representation of categorical data for the clustering task. In order to address this issue, we develop a categorical data clustering framework based on graph representation. In this framework, a graph-based representation method for categorical data is proposed, which learns the representation of categorical values from their similar graph to provide similar representations for similar categorical values. We compared the proposed framework with other representation methods for categorical data clustering on benchmark data sets. The experiment results illustrate the proposed frame-work is very effective, compared to other methods. (c) 2022 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available