4.8 Article

Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation

Journal

NATURE BIOTECHNOLOGY
Volume 34, Issue 12, Pages 1287-1291

Publisher

NATURE PUBLISHING GROUP
DOI: 10.1038/nbt.3682

Keywords

-

Funding

  1. NIH [5T32CA009337-35, HG005220]
  2. National Institute of Neurological Disorders and Stroke [5R01NS054794-08]
  3. Defense Advanced Research Projects Agency [DARPA-D12AP00025]

Ask authors/readers for more resources

We find that current computational methods for estimating transcript abundance from RNA-seq data can lead to hundreds of false-positive results. We show that these systematic errors stem largely from a failure to model fragment GC content bias. Sample-specific biases associated with fragment sequence features lead to misidentification of transcript isoforms. We introduce alpine, a method for estimating sample-specific bias-corrected transcript abundance. By incorporating fragment sequence features, alpine greatly increases the accuracy of transcript abundance estimates, enabling a fourfold reduction in the number of false positives for reported changes in expression compared with Cufflinks. Using simulated data, we also show that alpine retains the ability to discover true positives, similar to other approaches. The method is available as an R/Bioconductor package that includes data visualization tools useful for bias discovery.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available