期刊
BMC BIOLOGY
卷 20, 期 1, 页码 -出版社
BMC
DOI: 10.1186/s12915-022-01426-9
关键词
cis-regulatory modules; Enhancers; Functional states; Machine-learning; Predictions
类别
资金
- US National Science Foundation [DBI1661332]
This study proposes a two-step strategy to accurately predict the distribution of CRMs in the genome and their functional states in various cell/tissue types by integrating ChIP-seq data and using machine learning methods. The results show that functional states of CRMs can be accurately predicted using only 1 to 4 epigenetic marks, and the approach is more cost-effective than existing methods. The study also reveals common epigenetic rules for defining functional states of CRMs in humans and mice.
Background: Predicting cis-regulatory modules (CRMs) in a genome and their functional states in various cell/tissue types of the organism are two related challenging computational tasks. Most current methods attempt to simultaneously achieve both using data of multiple epigenetic marks in a cell/tissue type. Though conceptually attractive, they suffer high false discovery rates and limited applications. To fill the gaps, we proposed a two-step strategy to first predict a map of CRMs in the genome, and then predict functional states of all the CRMs in various cell/tissue types of the organism. We have recently developed an algorithm for the first step that was able to more accurately and completely predict CRMs in a genome than existing methods by integrating numerous transcription factor ChIP-seq datasets in the organism. Here, we presented machine-learning methods for the second step. Results: We showed that functional states in a cell/tissue type of all the CRMs in the genome could be accurately predicted using data of only 1 similar to 4 epigenetic marks by a variety of machine-learning classifiers. Our predictions are substantially more accurate than the best achieved so far. Interestingly, a model trained on a cell/tissue type in humans can accurately predict functional states of CRMs in different cell/tissue types of humans as well as of mice, and vice versa. Therefore, epigenetic code that defines functional states of CRMs in various cell/tissue types is universal at least in humans and mice. Moreover, we found that from tens to hundreds of thousands of CRMs were active in a human and mouse cell/tissue type, and up to 99.98% of them were reutilized in different cell/tissue types, while as small as 0.02% of them were unique to a cell/tissue type that might define the cell/tissue type. Conclusions: Our two-step approach can accurately predict functional states in any cell/tissue type of all the CRMs in the genome using data of only 1 similar to 4 epigenetic marks. Our approach is also more cost-effective than existing methods that typically use data of more epigenetic marks. Our results suggest common epigenetic rules for defining functional states of CRMs in various cell/tissue types in humans and mice.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据