Journal
INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES
Volume 14, Issue 3, Pages 697-711Publisher
SPRINGER HEIDELBERG
DOI: 10.1007/s12539-022-00520-4
Keywords
Promoter; Machine learning; Deep learning; Feature fusion; Feature selection; Deep Forest
Categories
Funding
- National Natural Science Foundation of China [61972322]
- Natural Science Foundation of Shaanxi Province [2021JM-110]
Ask authors/readers for more resources
In this study, we proposed a novel two-layer predictor, PredPromoter-MF(2L), based on multi-source feature fusion and ensemble learning, and demonstrated its superiority in promoter prediction compared to existing methods.
Promoters short DNA sequences play vital roles in initiating gene transcription. However, it remains a challenge to identify promoters using conventional experiment techniques in a high-throughput manner. To this end, several computational predictors based on machine learning models have been developed, while their performance is unsatisfactory. In this study, we proposed a novel two-layer predictor, called PredPromoter-MF(2L), based on multi-source feature fusion and ensemble learning. PredPromoter-MF(2L) was developed based on various deep features learned by a pre-trained deep learning network model and sequence-derived features. Feature selection based on XGBoost was applied to reduce fused features dimensions, and a cascade deep forest model was trained on the selected feature subset for promoter prediction. The results both fivefold cross-validation and independent test demonstrated that PredPromoter-MF(2L) outperformed state-of-the-art methods. [GRAPHICS] .
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available