☆ 4.5 Article

Universal annotation of the human genome through integration of over a thousand epigenomic datasets

GENOME BIOLOGY (2022)

Journal

GENOME BIOLOGY

Volume 23, Issue 1, Pages -

Publisher

BMC

DOI: 10.1186/s13059-021-02572-z

Keywords

Funding

US National Institute of Health [DP1DA044371, U01MH105578, UH3NS104095]
US National Science Foundation [1254200, 2125664]
Kure-IT award from Kure It cancer research
Rose Hills Innovator Award
UCLA Jonsson Comprehensive Cancer Center
Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research Ablon Scholars Program
Direct For Biological Sciences
Div Of Biological Infrastructure [2125664] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study proposes a new stacked modeling approach to generate a universal chromatin state annotation based on multiple cell type datasets. Compared to per-cell-type annotations, the full-stack annotation can directly differentiate constitutive from cell type-specific activity and predict the locations of external genomic annotations more accurately.

Background: Genome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the noncoding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative stacked modeling approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges. Results: Using a version of ChromHMM enhanced for large-scale applications, we apply the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, with the learned model denoted as the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we use in characterizing each state. Compared to per-cell-type annotations, the full-stack annotations directly differentiate constitutive from cell type-specific activity and is more predictive of locations of external genomic annotations. Conclusions: The full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing per-cell-type annotations for studying the non-coding human genome.

Universal annotation of the human genome through integration of over a thousand epigenomic datasets

Journal

GENOME BIOLOGY

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Universal annotation of the human genome through integration of over a thousand epigenomic datasets

Journal

GENOME BIOLOGY

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper