4.7 Article

iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor

期刊

BIOINFORMATICS
卷 37, 期 8, 页码 1060-1067

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btaa914

关键词

-

资金

  1. Basic Research Program of Science and Technology of Shenzhen [JCYJ20180306172637807]
  2. China Postdoctoral Science Foundation [2019M662770]
  3. National Natural Science Foundation of China [61472333, 61772441, 61472335, 61272152, 41476118, 61902125, 62002111, 61872309, 61972138]
  4. Natural Science Foundation of Hunan province [2019JJ50187]
  5. Scientific Research Project of Hunan Education Department [18B209]

向作者/读者索取更多资源

The study proposed a two-layer predictor named 'iEnhancer-XG' for enhancer recognition, using XGBoost as the base classifier and five feature extraction methods. By applying ensemble learning and SHapley Additive explanations, the prediction accuracy and credibility were improved.
Motivation: Enhancers are non-coding DNA fragments with high position variability and free scattering. They play an important role in controlling gene expression. As machine learning has become more widely used in identifying enhancers, a number of bioinformatic tools have been developed. Although several models for identifying enhancers and their strengths have been proposed, their accuracy and efficiency have yet to be improved. Results: We propose a two-layer predictor called 'iEnhancer-XG.' It comprises a one-layer predictor (for identifying enhancers) and a second classifier (for their strength) and uses 'XGBoost' as a base classifier and five feature extraction methods, namely, k-Spectrum Profile, Mismatch k-tuple, Subsequence Profile, Position-specific scoring matrix (PSSM) and Pseudo dinucleotide composition (PseDNC). Each method has an independent output. We place the feature vector matrix into the ensemble learning for fusion. This experiment involves the method of 'SHapley Additive explanations' to provide interpretability for the previous black box machine learning methods and improve their credibility. The accuracies of the ensemble learning method are 0.811 (first layer) and 0.657 (second layer). The rigorous 10-fold cross-validation confirms that the proposed method is significantly better than existing technologies.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据