☆ 4.5 Article

Clustering Categorical Data via Ensembling Dissimilarity Matrices

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS (2018)

Journal

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS

Volume 27, Issue 1, Pages 195-208

Publisher

TAYLOR & FRANCIS INC

DOI: 10.1080/10618600.2017.1305278

Keywords

Categorical data; Classification and clustering; Hamming distance; High-dimensional data; Sequence alignment; Stability

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

We present a technique for clustering categorical data by generating many dissimilarity matrices and combining them. We begin by demonstrating our technique on low-dimensional categorical data and comparing it to several other techniques that have been proposed. We show through simulations and examples that our method is both more accurate and more stable. Then we give conditions under which our method should yield good results in general. Our method extends to high-dimensional categorical data of equal lengths by ensembling over many choices of explanatory variables. In this context, we compare our method with two other methods. Finally, we extend our method to high-dimensional categorical data vectors of unequal length by using alignment techniques to equalize the lengths. We give an example to show that our method continues to provide useful results, in particular, providing a comparison with phylogenetic trees. Supplementary material for this article is available online.

Clustering Categorical Data via Ensembling Dissimilarity Matrices

Journal

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS

Publisher

TAYLOR & FRANCIS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Clustering Categorical Data via Ensembling Dissimilarity Matrices

Journal

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS

Publisher

TAYLOR & FRANCIS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper