4.8 Article

Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice

Journal

BMC BIOLOGY
Volume 20, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s12915-022-01426-9

Keywords

cis-regulatory modules; Enhancers; Functional states; Machine-learning; Predictions

Categories

Funding

  1. US National Science Foundation [DBI1661332]

Ask authors/readers for more resources

This study proposes a two-step strategy to accurately predict the distribution of CRMs in the genome and their functional states in various cell/tissue types by integrating ChIP-seq data and using machine learning methods. The results show that functional states of CRMs can be accurately predicted using only 1 to 4 epigenetic marks, and the approach is more cost-effective than existing methods. The study also reveals common epigenetic rules for defining functional states of CRMs in humans and mice.
Background: Predicting cis-regulatory modules (CRMs) in a genome and their functional states in various cell/tissue types of the organism are two related challenging computational tasks. Most current methods attempt to simultaneously achieve both using data of multiple epigenetic marks in a cell/tissue type. Though conceptually attractive, they suffer high false discovery rates and limited applications. To fill the gaps, we proposed a two-step strategy to first predict a map of CRMs in the genome, and then predict functional states of all the CRMs in various cell/tissue types of the organism. We have recently developed an algorithm for the first step that was able to more accurately and completely predict CRMs in a genome than existing methods by integrating numerous transcription factor ChIP-seq datasets in the organism. Here, we presented machine-learning methods for the second step. Results: We showed that functional states in a cell/tissue type of all the CRMs in the genome could be accurately predicted using data of only 1 similar to 4 epigenetic marks by a variety of machine-learning classifiers. Our predictions are substantially more accurate than the best achieved so far. Interestingly, a model trained on a cell/tissue type in humans can accurately predict functional states of CRMs in different cell/tissue types of humans as well as of mice, and vice versa. Therefore, epigenetic code that defines functional states of CRMs in various cell/tissue types is universal at least in humans and mice. Moreover, we found that from tens to hundreds of thousands of CRMs were active in a human and mouse cell/tissue type, and up to 99.98% of them were reutilized in different cell/tissue types, while as small as 0.02% of them were unique to a cell/tissue type that might define the cell/tissue type. Conclusions: Our two-step approach can accurately predict functional states in any cell/tissue type of all the CRMs in the genome using data of only 1 similar to 4 epigenetic marks. Our approach is also more cost-effective than existing methods that typically use data of more epigenetic marks. Our results suggest common epigenetic rules for defining functional states of CRMs in various cell/tissue types in humans and mice.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available