4.5 Article

Boosting multivariate structured additive distributional regression models

Journal

STATISTICS IN MEDICINE
Volume -, Issue -, Pages -

Publisher

WILEY
DOI: 10.1002/sim.9699

Keywords

generalized additive models for location; scale and shape; model-based boosting; multivariate Gaussian distribution; multivariate logit model; multivariate Poisson distribution; semiparametric regression

Ask authors/readers for more resources

Within the framework of generalized additive models, we have developed a model-based boosting approach for multivariate distributional regression, which allows for simultaneous modeling of all distribution parameters of a multivariate response conditional on explanatory variables. It is applicable to potentially high-dimensional data and incorporates data-driven variable selection. The approach also enables modeling the association between multiple continuous or discrete outcomes through relevant covariates.
We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dimensional data. Moreover, the boosting algorithm incorporates data-driven variable selection, taking various different types of effects into account. As a special merit of our approach, it allows for modeling the association between multiple continuous or discrete outcomes through the relevant covariates. After a detailed simulation study investigating estimation and prediction performance, we demonstrate the full flexibility of our approach in three diverse biomedical applications. The first is based on high-dimensional genomic cohort data from the UK Biobank, considering a bivariate binary response (chronic ischemic heart disease and high cholesterol). Here, we are able to identify genetic variants that are informative for the association between cholesterol and heart disease. The second application considers the demand for health care in Australia with the number of consultations and the number of prescribed medications as a bivariate count response. The third application analyses two dimensions of childhood undernutrition in Nigeria as a bivariate response and we find that the correlation between the two undernutrition scores is considerably different depending on the child's age and the region the child lives in.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available