Journal
BIOSTATISTICS
Volume 22, Issue 4, Pages 687-705Publisher
OXFORD UNIV PRESS
DOI: 10.1093/biostatistics/kxz060
Keywords
Count data; Penalized likelihood; Prediction; Regression; Sufficient statistic; visualization
Funding
- National Agency for the Promotion of Science and Technology of Argentina [PICT-2016-2515]
- Binational Scientific Cooperation Program CONICET-National Institutes of Health
- intramural program of the NCI
Ask authors/readers for more resources
Recent efforts in characterizing the human microbiome and its relation to chronic diseases have led to advancements in statistical methods for compositional data. Likelihood-based sufficient dimension reduction methods have been developed to find linear combinations that contain all the information in the compositional data regarding an outcome variable. These methods, incorporating variable selection and penalties, address invariance issues arising from the compositional nature of the data and can be applied to continuous or categorical outcomes.
Recent efforts to characterize the human microbiome and its relation to chronic diseases have led to a surge in statistical development for compositional data. We develop likelihood-based sufficient dimension reduction methods (SDR) to find linear combinations that contain all the information in the compositional data on an outcome variable, i.e., are sufficient for modeling and prediction of the outcome. We consider several models for the inverse regression of the compositional vector or transformations of it, as a function of outcome. They include normal, multinomial, and Poisson graphical models that allow for complex dependencies among observed counts. These methods yield efficient estimators of the reduction and can be applied to continuous or categorical outcomes. We incorporate variable selection into the estimation via penalties and address important invariance issues arising from the compositional nature of the data. We illustrate and compare our methods and some established methods for analyzing microbiome data in simulations and using data from the Human Microbiome Project. Displaying the data in the coordinate system of the SDR linear combinations allows visual inspection and facilitates comparisons across studies.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available