☆ 4.7 Article

Length bias correction for RNA-seq data in gene set analyses

BIOINFORMATICS (2011)

期刊

BIOINFORMATICS

卷 27, 期 5, 页码 662-669

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btr005

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

NIH [RR024163, NIGMS-R01-GM-74913, NIDDK P30 DK07022, NCRR 1UL1RR025777-03]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Motivation: Next-generation sequencing technologies are being rapidly applied to quantifying transcripts (RNA-seq). However, due to the unique properties of the RNA-seq data, the differential expression of longer transcripts is more likely to be identified than that of shorter transcripts with the same effect size. This bias complicates the downstream gene set analysis (GSA) because the methods for GSA previously developed for microarray data are based on the assumption that genes with same effect size have equal probability (power) to be identified as significantly differentially expressed. Since transcript length is not related to gene expression, adjusting for such length dependency in GSA becomes necessary. Results: In this article, we proposed two approaches for transcript-length adjustment for analyses based on Poisson models: (i) At individual gene level, we adjusted each gene's test statistic using the square root of transcript length followed by testing for gene set using the Wilcoxon rank-sum test. (ii) At gene set level, we adjusted the null distribution for the Fisher's exact test by weighting the identification probability of each gene using the square root of its transcript length. We evaluated these two approaches using simulations and a real dataset, and showed that these methods can effectively reduce the transcript-length biases. The top-ranked GO terms obtained from the proposed adjustments show more overlaps with the microarray results.

Length bias correction for RNA-seq data in gene set analyses

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Length bias correction for RNA-seq data in gene set analyses

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文