4.7 Article

A classification framework for multivariate compositional data with Dirichlet feature embedding

期刊

KNOWLEDGE-BASED SYSTEMS
卷 212, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2020.106614

关键词

Multivariate compositional data; Classification; Feature embedding; Dirichlet distribution; Support vector machine

资金

  1. National Natural Science Foundation of China (NSFC) [72001222, 61832001, 61702016]
  2. National Key Research and Development Program of China [2018YFB1004403]
  3. PKU-Baidu Fund [2019BD006]
  4. Beijing Academy of Artificial Intelligence (BAAI)
  5. PKU-Tencent joint research Lab

向作者/读者索取更多资源

This paper proposes an effective framework for multivariate compositional data classification, utilizing Dirichlet feature embedding to remove data constraint, obtain high-quality training data, and reduce dimensionality, followed by employing support vector machine to build the classification model. Results from simulation study and real-world dataset demonstrate the proposed method achieves good performance.
Compositional data which contain relative or structure information of a whole occur commonly in many disciplines and practical scenarios. Yet relatively few works are available for multivariate compositional data classification with different numbers of parts using machine learning. This is because compositional data is inherently constrained to unit sum, resulting in the existing methods cannot be directly applied. Particularly, the multivariate analysis methods for compositional data variables with unequal sizes of parts are not sufficiently investigated. Moreover, to design a good classification model is indeed a complicated work. Except for the learning algorithm, data quality is also an essential determinant, which is rarely been concerned. In this paper, we propose an effective framework for multivariate compositional data classification. Specifically, the Dirichlet feature embedding is proposed to implement on the original compositional data features with the goal of removing the constraint and obtaining high quality training data, as well as reducing the dimension. Support vector machine is then used to build the classification model. Results of simulation study and real-world dataset show our proposed method can achieve good performances. (C) 2020 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据