4.7 Article

SLNL: A novel method for gene selection and phenotype classification

Journal

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
Volume 37, Issue 9, Pages 6283-6304

Publisher

WILEY-HINDAWI
DOI: 10.1002/int.22844

Keywords

gene selection; network-based penalty; phenotype classification; prior knowledge; regularization

Funding

  1. National Natural Science Foundation of China [62102261, 62006155, 6201101081]
  2. Science and Technology Development Fund, Macau SAR [0056/2020/AFJ, 0158/2019/A3]
  3. Special Innovation Projects of Universities in Guangdong Province [2018KTSCX205]
  4. Science and Technology Project of Shaoguan City [200811104531028]

Ask authors/readers for more resources

The article proposes a self-paced learning L1/2 absolute network-based logistic regression model (SLNL), which improves the accuracy and interpretability of phenotype prediction and gene marker selection in genomic data analysis through L1/2 regularization and absolute network penalty.
One of the central tasks of genome research is to predict phenotypes and discover some important gene biomarkers. However, there are three main problems in analyzing genomics data to predict phenotypes and gene marker selection. Such as large p and small n, low reproducibility of the selected biomarkers, and high noise. To provide a unified solution to alleviate the problems as mentioned above, we propose a self-paced learning L 1 / 2 ${{\rm{L}}}_{1/2}$ absolute network-based logistic regression model, called SLNL. Through the L 1 / 2 ${L}_{1/2}$ regularization, the model can get a more sparse result, which provides better interpretability. The absolute network-based penalty enables the model to integrate the feature network knowledge and helps select higher reproducibility genes. Moreover, this proposed penalty overcomes the drawback of a traditional network penalty without considering the sign of the coefficient. By the self-paced learning strategy, the model can now consider the noise level in gene expression data, lower the impact of high noise samples in data to model training, and provide better prediction accuracy. We compare the proposed method with six alternative approaches in various experimental scenarios, including a comprehensive simulation, four benchmark gene expression datasets, one lung cancer data set, and three lung cancer validation sets. Results show that SLNL can identify fewer meaningful biomarkers and obtain the best or equivalent prediction performance. Moreover, biological analysis shows that the genes selected by the SLNL might be helpful to tumor diagnosis and treatment.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available