☆ 4.7 Article

Prediction of the transcription factor binding sites with meta-learning

METHODS (2022)

期刊

METHODS

卷 203, 期 -, 页码 207-213

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.ymeth.2022.04.010

关键词

Convolution neural network; Transcription factor binding sites; Meta learning; Noisy labels data

类别

Biochemical Research Methods Biochemistry & Molecular Biology

资金

National Natural Science Foundation of China [61873202,62173271, 61621003]
Strategic Priority Research Program of the Chinese Academy of Sciences (CAS) [XDPB17]
National Ten Thousand Talent Pro-gram for Young Top-notch Talents [QYZDB-SSW-SYS008]
CAS Frontier Science Research Key Project for Top Young Scientist

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this study, a meta learning-based CNN method (MLCNN) was proposed for accurately identifying TFBSs from ChIP-seq data. By being guided by a small amount of unbiased meta-data, MLCNN can adaptively learn a weighting function and overcome the influence of biased training data on the classifier. Experimental results demonstrate that MLCNN outperforms other CNN methods and can detect and suppress noisy samples.

With the accumulation of ChIP-seq data, convolution neural network (CNN)-based methods have been proposed for predicting transcription factor binding sites (TFBSs). However, biological experimental data are noisy, and are often treated as ground truth for both training and testing. Particularly, existing classification methods ignore the false positive and false negative which are caused by the error in the peak calling stage, and therefore, they can easily overfit to biased training data. It leads to inaccurate identification and inability to reveal the rules of governing protein-DNA binding. To address this issue, we proposed a meta learning-based CNN method (namely TFBS_MLCNN or MLCNN for short) for suppressing the influence of noisy labels data and accurately recognizing TFBSs from ChIP-seq data. Guided by a small amount of unbiased meta-data, MLCNN can adaptively learn an explicit weighting function from ChIP-seq data and update the parameter of classifier simultaneously. The weighting function overcomes the influence of biased training data on classifier by assigning a weight to each sample according to its training loss. The experimental results on 424 ChIP-seq datasets show that MLCNN not only outperforms other existing state-of-the-art CNN methods, but can also detect noisy samples which are given the small weights to suppress them. The suppression ability to the noisy samples can be revealed through the visualization of samples' weights. Several case studies demonstrate that MLCNN has superior performance to others.

Prediction of the transcription factor binding sites with meta-learning

期刊

METHODS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Prediction of the transcription factor binding sites with meta-learning

期刊

METHODS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文