期刊
NEUROCOMPUTING
卷 501, 期 -, 页码 41-74出版社
ELSEVIER
DOI: 10.1016/j.neucom.2022.05.118
关键词
Clustering; Mixtures; Von Mises-Fisher; Expectation maximization; High dimensional data; Path following strategy; Model selection
The article presents a method for clustering data on the unit hypersphere using mixtures of von Mises-Fisher distributions, which is particularly suitable for high-dimensional directional data. By estimating a sparse von Mises mixture using a penalized likelihood, the clustering interpretability is improved. The approach is evaluated on simulated and real data benchmarks, showing its advantages. Additionally, a new dataset on financial reports is introduced, highlighting the benefits of the method for exploratory analysis.
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l(1) penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation , explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis. (C) 2022 Elsevier B.V. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据