☆ 4.3 Article

Large-Scale Quality Analysis of Published ChIP-seq Data

G3-GENES GENOMES GENETICS (2014)

期刊

G3-GENES GENOMES GENETICS

卷 4, 期 2, 页码 209-223

出版社

GENETICS SOCIETY AMERICA

DOI: 10.1534/g3.113.008680

关键词

ChIP-seq; chromatin immunoprecipitation; cross-correlation; quality assessment; transcription factor

类别

Genetics & Heredity

资金

Beckman Foundation
Donald Bren Endowment
National Institutes of Health [U54 HG004576, U54 HG006998]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

ChIP-seq has become the primary method for identifying in vivo protein-DNA interactions on a genome-wide scale, with nearly 800 publications involving the technique appearing in PubMed as of December 2012. Individually and in aggregate, these data are an important and information-rich resource. However, uncertainties about data quality confound their use by the wider research community. Recently, the Encyclopedia of DNA Elements (ENCODE) project developed and applied metrics to objectively measure ChIP-seq data quality. The ENCODE quality analysis was useful for flagging datasets for closer inspection, eliminating or replacing poor data, and for driving changes in experimental pipelines. There had been no similarly systematic quality analysis of the large and disparate body of published ChIP-seq profiles. Here, we report a uniform analysis of vertebrate transcription factor ChIP-seq datasets in the Gene Expression Omnibus (GEO) repository as of April 1, 2012. The majority (55%) of datasets scored as being highly successful, but a substantial minority (20%) were of apparently poor quality, and another approximate to 25% were of intermediate quality. We discuss how different uses of ChIP-seq data are affected by specific aspects of data quality, and we highlight exceptional instances for which the metric values should not be taken at face value. Unexpectedly, we discovered that a significant subset of control datasets (i.e., no immunoprecipitation and mock immunoprecipitation samples) display an enrichment structure similar to successful ChIP-seq data. This can, in turn, affect peak calling and data interpretation. Published datasets identified here as high-quality comprise a large group that users can draw on for large-scale integrated analysis. In the future, ChIP-seq quality assessment similar to that used here could guide experimentalists at early stages in a study, provide useful input in the publication process, and be used to stratify ChIP-seq data for different community-wide uses.

Large-Scale Quality Analysis of Published ChIP-seq Data

期刊

G3-GENES GENOMES GENETICS

出版社

GENETICS SOCIETY AMERICA

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Large-Scale Quality Analysis of Published ChIP-seq Data

期刊

G3-GENES GENOMES GENETICS

出版社

GENETICS SOCIETY AMERICA

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文