4.4 Article

Variable Selection in Compositional Data Analysis Using Pairwise Logratios

期刊

MATHEMATICAL GEOSCIENCES
卷 51, 期 5, 页码 649-682

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s11004-018-9754-x

关键词

Compositional data; Logratio transformation; Logratio analysis; Logratio distance; Multivariate analysis; Ratios; Subcompositional coherence; Univariate statistics; Variable selection

向作者/读者索取更多资源

In the approach to compositional data analysis originated by John Aitchison, a set of linearly independent logratios (i.e., ratios of compositional parts, logarithmically transformed) explains all the variability in a compositional data set. Such a set of ratios can be represented by an acyclic connected graph of all the parts, with edges one less than the number of parts. There are many such candidate sets of ratios, each of which explains 100% of the compositional logratio variance. A simple choice consists in using additive logratios, and it is demonstrated how to identify one set that can serve as a substitute for the original data set in the sense of best approximating the essential multivariate structure. When all pairwise ratios of parts are candidates for selection, a smaller set of ratios can be determined by automatic selection, but preferably assisted by expert knowledge, which explains as much variability as required to reveal the underlying structure of the data. Conventional univariate statistical summary measures as well as multivariate methods can be applied to these ratios. Such a selection of a small set of ratios also implies the choice of a subset of parts, that is, a subcomposition, which explains a maximum percentage of variance. This approach of ratio selection, designed to simplify the task of the practitioner, is illustrated on an archaeometric data set as well as three further data sets in an Appendix. Comparisons are also made with existing proposals for selecting variables in compositional data analysis.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据