4.7 Article

SPLSN: An efficient tool for survival analysis and biomarker selection

Journal

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
Volume 36, Issue 10, Pages 5845-5865

Publisher

WILEY-HINDAWI
DOI: 10.1002/int.22532

Keywords

gene selection; log-sum penalty; network interaction; regularization; self-paced learning

Funding

  1. Macau Science and Technology Development Funds of Macau SAR of China [0002/2019/APD]
  2. MOE (Ministry of Education in China) Project of Humanities and Social Sciences [18YJCZH054]
  3. Natural Science Foundation of Guangdong Province [2018A030307033]
  4. Special Innovation Projects of Universities in Guangdong Province [2018KTSCX205]
  5. National Natural Science Foundation of China [6201101081, 62006155]
  6. Science and Technology Project of Shaoguan City [200811104531028]
  7. Macau University of Science and Technology Foundation [0055/2018/A2]

Ask authors/readers for more resources

The study presented a novel SPLSN sparse Cox regression model, which combines self-paced learning and a log-sum absolute network-based penalty for biomarker selection in survival analysis. Results show that SPLSN can identify fewer meaningful biomarkers and achieve the best or equivalent prediction performance compared to other methods.
In genome research, it is a fundamental issue to identify few but important survival-related biomarkers. The Cox model is a widely used survival analysis technique, which is used to study the relationship between characteristics and survival response. However, limitations of the existing Cox methods for genomic data are as follows: (1) a typical gene expression data set consists of tens of thousands of genes, and the result of current methods may not be sparse enough; (2) a wealth of structural information about many biological processes, such as regulatory networks and pathways, has often been ignored; (3) genomic data is usually considered as high noise, which is usually ignored in current methods. To alleviate the above problems, in this paper, we study a novel sparse Cox regression model, called SPLSN, which combines self-paced learning (SPL) and a log-sum absolute network-based penalty (Logsum-Net), especially for biomarker selection in survival analysis. SPL is embedded in curriculum design, and the model is trained by gradually increasing samples from low noise to high noise during the training process. The Logsum-Net encourages smoothness among the coefficients of adjacent genes on a specific biological network. We compare the proposed method with five alternative approaches in various experimental scenarios, including a comprehensive simulation, seven benchmark gene expression data sets, and one large validation data set. Results show that the SPLSN can identify fewer meaningful biomarkers and obtain the best or equivalent prediction performance. Moreover, the biological analysis shows that the genes selected by the SPLSN might be helpful to tumor treatment.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available