4.5 Article

Sufficient dimension reduction for compositional data

Journal

BIOSTATISTICS
Volume 22, Issue 4, Pages 687-705

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/biostatistics/kxz060

Keywords

Count data; Penalized likelihood; Prediction; Regression; Sufficient statistic; visualization

Funding

  1. National Agency for the Promotion of Science and Technology of Argentina [PICT-2016-2515]
  2. Binational Scientific Cooperation Program CONICET-National Institutes of Health
  3. intramural program of the NCI

Ask authors/readers for more resources

Recent efforts in characterizing the human microbiome and its relation to chronic diseases have led to advancements in statistical methods for compositional data. Likelihood-based sufficient dimension reduction methods have been developed to find linear combinations that contain all the information in the compositional data regarding an outcome variable. These methods, incorporating variable selection and penalties, address invariance issues arising from the compositional nature of the data and can be applied to continuous or categorical outcomes.
Recent efforts to characterize the human microbiome and its relation to chronic diseases have led to a surge in statistical development for compositional data. We develop likelihood-based sufficient dimension reduction methods (SDR) to find linear combinations that contain all the information in the compositional data on an outcome variable, i.e., are sufficient for modeling and prediction of the outcome. We consider several models for the inverse regression of the compositional vector or transformations of it, as a function of outcome. They include normal, multinomial, and Poisson graphical models that allow for complex dependencies among observed counts. These methods yield efficient estimators of the reduction and can be applied to continuous or categorical outcomes. We incorporate variable selection into the estimation via penalties and address important invariance issues arising from the compositional nature of the data. We illustrate and compare our methods and some established methods for analyzing microbiome data in simulations and using data from the Human Microbiome Project. Displaying the data in the coordinate system of the SDR linear combinations allows visual inspection and facilitates comparisons across studies.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available