4.5 Article

Pursuing sources of heterogeneity in modeling clustered population

Journal

BIOMETRICS
Volume 78, Issue 2, Pages 716-729

Publisher

WILEY
DOI: 10.1111/biom.13434

Keywords

clustering; finite mixture model; generalized lasso; population heterogeneity

Funding

  1. Jiangxi ProvincialNatural Science Foundation of China [20202BABL201013]
  2. NationalNatural Science Foundation of China [11661038]
  3. U.S. National Science Foundation [DMS-1461677, DMS-1613295, IIS1718798]
  4. U.S. National Institutes of Health [R01-MH112148, R01-MH112148-03S1, R01-MH124740]
  5. U.S. Department of Energy [10006272]

Ask authors/readers for more resources

Researchers often face heterogeneous populations with mixed regression relationships in the era of data explosion. In such situations, identifying predictors associated with the outcome and distinguishing true sources of heterogeneity are of interest. A regularized finite mixture effects regression method is proposed for this purpose, achieving both heterogeneity pursuit and feature selection simultaneously with efficiency and consistency.
Researchers often have to deal with heterogeneous population with mixed regression relationships, increasingly so in the era of data explosion. In such problems, when there are many candidate predictors, it is not only of interest to identify the predictors that are associated with the outcome, but also to distinguish the true sources of heterogeneity, that is, to identify the predictors that have different effects among the clusters and thus are the true contributors to the formation of the clusters. We clarify the concepts of the source of heterogeneity that account for potential scale differences of the clusters and propose a regularized finite mixture effects regression to achieve heterogeneity pursuit and feature selection simultaneously. We develop an efficient algorithm and show that our approach can achieve both estimation and selection consistency. Simulation studies further demonstrate the effectiveness of our method under various practical scenarios. Three applications are presented, namely, an imaging genetics study for linking genetic factors and brain neuroimaging traits in Alzheimer's disease, a public health study for exploring the association between suicide risk among adolescents and their school district characteristics, and a sport analytics study for understanding how the salary levels of baseball players are associated with their performance and contractual status.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available