☆ 4.6 Article

Supervised promoter recognition: a benchmark framework

BMC BIOINFORMATICS (2022)

期刊

BMC BIOINFORMATICS

卷 23, 期 1, 页码 -

出版社

BMC

DOI: 10.1186/s12859-022-04647-5

关键词

Machine learning; Deep learning; Bioinformatics; Promoter recognition

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Mathematical & Computational Biology

资金

Google Cloud academic research grant
Azure Sponsorship through the Microsoft AI for Health Azure grant
University of Victoria graduate fellowship
NSERC Discovery Grants

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article presents a framework called SUPR REF, which streamlines the process of training, validating, testing, and comparing promoter recognition models. Using biologically relevant benchmark datasets, the authors showcase the framework and evaluate the performance of previous models on new benchmark datasets. The results indicate that there is still room for improvement in the reliability of deep learning methods for promoter recognition in eukaryotic genomic sequences.

Motivation Deep learning has become a prevalent method in identifying genomic regulatory sequences such as promoters. In a number of recent papers, the performance of deep learning models has continually been reported as an improvement over alternatives for sequence-based promoter recognition. However, the performance improvements in these models do not account for the different datasets that models are evaluated on. The lack of a consensus dataset and procedure for benchmarking purposes has made the comparison of each model's true performance difficult to assess. Results We present a framework called Supervised Promoter Recognition Framework ('SUPR REF') capable of streamlining the complete process of training, validating, testing, and comparing promoter recognition models in a systematic manner. SUPR REF includes the creation of biologically relevant benchmark datasets to be used in the evaluation process of deep learning promoter recognition models. We showcase this framework by comparing the models' performances on alternative datasets, and properly evaluate previously published models on new benchmark datasets. Our results show that the reliability of deep learning ab initio promoter recognition models on eukaryotic genomic sequences is still not at a sufficient level, as overall performance is still low. These results originate from a subset of promoters, the well-known RNA Polymerase II core promoters. Furthermore, given the observational nature of these data, cross-validation results from small promoter datasets need to be interpreted with caution.

Supervised promoter recognition: a benchmark framework

期刊

BMC BIOINFORMATICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Supervised promoter recognition: a benchmark framework

期刊

BMC BIOINFORMATICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文