4.5 Article

iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree

Journal

MATHEMATICAL BIOSCIENCES AND ENGINEERING
Volume 18, Issue 6, Pages 8797-8814

Publisher

AMER INST MATHEMATICAL SCIENCES-AIMS
DOI: 10.3934/mbe.2021434

Keywords

identification; enhancers; multiple features; gradient boosting decision tree

Funding

  1. National Natural Science Foundation of China [12101480]
  2. Natural Science Basic Research Program of Shaanxi [2021JM-115, 2021JM-444]
  3. Fundamental Research Funds for the Central Universities [JB210715]

Ask authors/readers for more resources

In this study, a model named iEnhancer-MFGBDT was developed to identify enhancers and their strength by fusing multiple features and gradient boosting decision tree. The model achieved accuracies of 78.67% and 66.04% for identifying enhancers and their strength on the benchmark dataset, demonstrating its usefulness and effectiveness as an intelligent tool for enhancer identification.
Enhancer is a non-coding DNA fragment that can be bound with proteins to activate transcription of a gene, hence play an important role in regulating gene expression. Enhancer identification is very challenging and more complicated than other genetic factors due to their position variation and free scattering. In addition, it has been proved that genetic variation in enhancers is related to human diseases. Therefore, identification of enhancers and their strength has important biological meaning. In this paper, a novel model named iEnhancer-MFGBDT is developed to identify enhancer and their strength by fusing multiple features and gradient boosting decision tree (GBDT). Multiple features include k-mer and reverse complement k-mer nucleotide composition based on DNA sequence, and second-order moving average, normalized Moreau-Broto auto-cross correlation and Moran auto-cross correlation based on dinucleotide physical structural property matrix. Then we use GBDT to select features and perform classification successively. The accuracies reach 78.67% and 66.04% for identifying enhancers and their strength on the benchmark dataset, respectively. Compared with other models, the results show that our model is useful and effective intelligent tool to identify enhancers and their strength, of which the datasets and source codes are available at https://github.com/shengli0201/ iEnhancer-MFGBDT1.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available