4.2 Article

Mixed Deep Gaussian Mixture Model: a clustering model for mixed datasets

Journal

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
Volume 16, Issue 1, Pages 31-53

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s11634-021-00466-3

Keywords

Binary and count data; Deep Gaussian Mixture Model; Generalized Linear Latent Variable Model; MCEM algorithm; Ordinal and categorical data; Two-heads architecture

Funding

  1. Research Chair DIALog under the aegis of the Risk Foundation
  2. CNP Assurances
  3. Universite Claude Bernard Lyon 1 (UCBL)
  4. AMU
  5. CNRS
  6. ECM
  7. INdAM
  8. ISFA

Ask authors/readers for more resources

The article introduces a clustering method based on a multi-layer architecture model called Mixed Deep Gaussian Mixture Model, which automatically merges clustering performed separately on continuous and non-continuous data. The model provides continuous low-dimensional representations of the data, and its performance is validated by comparing it with other state-of-the-art mixed data clustering models.
Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. A clustering algorithm should be able, despite of this heterogeneity, to extract discriminant pieces of information from the variables in order to design groups. In this work we introduce a multilayer architecture model-based clustering method called Mixed Deep Gaussian Mixture Model that can be viewed as an automatic way to merge the clustering performed separately on continuous and non-continuous data. This architecture is flexible and can be adapted to mixed as well as to continuous or non-continuous data. In this sense we generalize Generalized Linear Latent Variable Models and Deep Gaussian Mixture Models. We also design a new initialisation strategy and a data driven method that selects the best specification of the model and the optimal number of clusters for a given dataset. Besides, our model provides continuous low-dimensional representations of the data which can be a useful tool to visualize mixed datasets. Finally, we validate the performance of our approach comparing its results with state-of-the-art mixed data clustering models over several commonly used datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available