4.6 Article

Modeling chromatin state from sequence across angiosperms using recurrent convolutional neural networks

Journal

PLANT GENOME
Volume 15, Issue 3, Pages -

Publisher

WILEY
DOI: 10.1002/tpg2.20249

Keywords

-

Funding

  1. NSF Graduate Research Fellowship [DGE-1650441]
  2. NSF Postdoctoral Fellowship in Biology [DBI-1905869]
  3. Australian Research Council (ARC) Discovery Early Career Award [DE200101748]
  4. USDA-ARS [NSF : IOS-1934384]
  5. Australian Research Council [DE200101748] Funding Source: Australian Research Council

Ask authors/readers for more resources

Accessible chromatin regions are critical for gene regulation, and predicting them from sequence in plants has been challenging. By training a deep learning model, we successfully predicted chromatin accessibility and methylation levels across multiple angiosperm species, revealing the conservation of chromatin mechanisms. We also identified important transcription factor families.
Accessible chromatin regions are critical components of gene regulation but modeling them directly from sequence remains challenging, especially within plants, whose mechanisms of chromatin remodeling are less understood than in animals. We trained an existing deep-learning architecture, DanQ, on data from 12 angiosperm species to predict the chromatin accessibility in leaf of sequence windows within and across species. We also trained DanQ on DNA methylation data from 10 angiosperms because unmethylated regions have been shown to overlap significantly with ACRs in some plants. The across-species models have comparable or even superior performance to a model trained within species, suggesting strong conservation of chromatin mechanisms across angiosperms. Testing a maize (Zea mays L.) held-out model on a multi-tissue chromatin accessibility panel revealed our models are best at predicting constitutively accessible chromatin regions, with diminishing performance as cell-type specificity increases. Using a combination of interpretation methods, we ranked JASPAR motifs by their importance to each model and saw that the TCP and AP2/ERF transcription factor (TF) families consistently ranked highly. We embedded the top three JASPAR motifs for each model at all possible positions on both strands in our sequence window and observed position- and strand-specific patterns in their importance to the model. With our publicly available across-species 'a2z' model it is now feasible to predict the chromatin accessibility and methylation landscape of any angiosperm genome.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available