4.7 Article

MSPJ: Discovering potential biomarkers in small gene expression datasets via ensemble learning

期刊

出版社

ELSEVIER
DOI: 10.1016/j.csbj.2022.07.022

关键词

Small sample size; Random sampling; Feature selection; Differentially expressed genes; Machine learning

资金

  1. Nursery Project of Army Medical University [2019R054]
  2. Natural Science Foundation of Chongqing, China [CSTC2019JCYJ-MSXMX0527]
  3. Open Fund of Yunnan Key Laboratory of Plant Reproductive Adaptation and Evolutionary Ecology, Yunnan University, Chongqing Technology Innovation and Application Development Special key Project [cstc2019jscx-dxwtBX0010]
  4. Science and Technology Research Program of Chongqing Municipal Education Commission [KJQN202100538]

向作者/读者索取更多资源

The study proposed a new machine learning approach, MSPJ, for identifying DEGs in small gene expression datasets, showing the best performance in most small datasets and potentially advancing research on molecular mechanisms underlying complex diseases or phenotypes.
In transcriptomics, differentially expressed genes (DEGs) provide fine-grained phenotypic resolution for comparisons between groups and insights into molecular mechanisms underlying the pathogenesis of complex diseases or phenotypes. The robust detection of DEGs from large datasets is well-established. However, owing to various limitations (e.g., the low availability of samples for some diseases or limited research funding), small sample size is frequently used in experiments. Therefore, methods to screen reliable and stable features are urgently needed for analyses with limited sample size. In this study, MSPJ, a new machine learning approach for identifying DEGs was proposed to mitigate the reduced power and improve the stability of DEG identification in small gene expression datasets. This ensemble learning-based method consists of three algorithms: an improved multiple random sampling with meta-analysis, SVM-RFE (support vector machines-recursive feature elimination), and permutation test. MSPJ was compared with ten classical methods by 94 simulated datasets and large-scale benchmarking with 165 real datasets. The results showed that, among these methods MSPJ had the best performance in most small gene expression datasets, especially those with sample size below 30. In summary, the MSPJ method enables effective feature selection for robust DEG identification in small transcriptome datasets and is expected to expand research on the molecular mechanisms underlying complex diseases or phenotypes. (c) 2022 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY-NC-ND license (http://creative-commons.org/licenses/by-nc-nd/4.0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据