4.7 Article

Supervised clustering of high-dimensional data using regularized mixture modeling

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 22, Issue 4, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbaa291

Keywords

supervised learning; mixture modeling; disease heterogeneity

Funding

  1. National Science Foundation Div Of Information & Intelligent Systems [1850360]
  2. National Institute of General Medical Sciences [1R01GM131399-01]
  3. Indiana Clinical and Translational Sciences Institute Showalter Young Investigator Award

Ask authors/readers for more resources

Identifying relationships between genetic variations and clinical presentations is challenging due to heterogeneous causes of a disease. A novel supervised clustering algorithm, CSMR, was proposed to address the complex relationships between high-dimensional genetic features and phenotypes, demonstrating superior performance in accurately identifying explanatory features and clustering distinct subgroups in drug sensitivity datasets. CSMR represents a powerful tool for big data analysis in translating clinical representations to underlying causes of diseases, potentially revolutionizing personalized medicine.
Identifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects.We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR), to deal with the challenges in studying the heterogeneous relationships between high-dimensional genetic features and a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical representations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease and could be of special relevance in the growing field of personalized medicine.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available