☆ 4.8 Article

Extracting sequence features to predict proteinDNA interactions: a comparative study

NUCLEIC ACIDS RESEARCH (2008)

Journal

NUCLEIC ACIDS RESEARCH

Volume 36, Issue 12, Pages 4137-4148

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/nar/gkn361

Keywords

Funding

Direct For Mathematical & Physical Scien [0805491] Funding Source: National Science Foundation
Division Of Mathematical Sciences [0805491] Funding Source: National Science Foundation
NIGMS NIH HHS [R01-GM080625-01A1, R01-GM07899, R01 GM080625] Funding Source: Medline

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Predicting how and where proteins, especially transcription factors (TFs), interact with DNA is an important problem in biology. We present here a systematic study of predictive modeling approaches to the TFDNA binding problem, which have been frequently shown to be more efficient than those methods only based on position-specific weight matrices (PWMs). In these approaches, a statistical relationship between genomic sequences and gene expression or ChIP-binding intensities is inferred through a regression framework; and influential sequence features are identified by variable selection. We examine a few state-of-the-art learning methods including stepwise linear regression, multivariate adaptive regression splines, neural networks, support vector machines, boosting and Bayesian additive regression trees (BART). These methods are applied to both simulated datasets and two whole-genome ChIP-chip datasets on the TFs Oct4 and Sox2, respectively, in human embryonic stem cells. We find that, with proper learning methods, predictive modeling approaches can significantly improve the predictive power and identify more biologically interesting features, such as TFTF interactions, than the PWM approach. In particular, BART and boosting show the best and the most robust overall performance among all the methods.

Extracting sequence features to predict proteinDNA interactions: a comparative study

Journal

NUCLEIC ACIDS RESEARCH

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Extracting sequence features to predict proteinDNA interactions: a comparative study

Journal

NUCLEIC ACIDS RESEARCH

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper