☆ 4.5 Article Proceedings Paper

AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

JOURNAL OF COMPUTATIONAL BIOLOGY (2011)

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

卷 18, 期 11, 页码 1495-1505

出版社

MARY ANN LIEBERT, INC

DOI: 10.1089/cmb.2011.0185

关键词

ChIP-Seq; cohesin; CTCF; expectation-maximization; high-throughput sequencing; mixture model; peak-caller; repetitive elements; Srebp-1

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

NICHD NIH HHS [T32 HD060555, HD062951] Funding Source: Medline
NLM NIH HHS [T15LM07443] Funding Source: Medline
Direct For Biological Sciences [846218] Funding Source: National Science Foundation
Div Of Biological Infrastructure [846218] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, co-factors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here, we introduce a probabilistic approach for ChIP-Seq data analysis that utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem.

AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文