4.7 Article

Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing

Journal

CELL SYSTEMS
Volume 2, Issue 4, Pages 239-250

Publisher

CELL PRESS
DOI: 10.1016/j.cels.2016.04.001

Keywords

-

Funding

  1. UCSF Center for Systems and Synthetic Biology (NIGMS) [P50 GM081879]
  2. Paul G. Allen Family Foundation
  3. NIH
  4. National Cancer Institute
  5. National Institute of Dental and Craniofacial Research (NIH) [DP5 OD012194]

Ask authors/readers for more resources

A tradeoff between precision and throughput constrains all biological measurements, including sequencing-based technologies. Here, we develop a mathematical framework that defines this tradeoff between mRNA-sequencing depth and error in the extraction of biological information. We find that transcriptional programs can be reproducibly identified at 1% of conventional read depths. We demonstrate that this resilience to noise of shallow'' sequencing derives from a natural property, low dimensionality, which is a fundamental feature of gene expression data. Accordingly, our conclusions hold for similar to 350 single-cell and bulk gene expression datasets across yeast, mouse, and human. In total, our approach provides quantitative guidelines for the choice of sequencing depth necessary to achieve a desired level of analytical resolution. We codify these guidelines in an open-source read depth calculator. This work demonstrates that the structure inherent in biological networks can be productively exploited to increase measurement throughput, an idea that is now common in many branches of science, such as image processing.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available