4.6 Article

Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression

期刊

PLOS ONE
卷 7, 期 10, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0046128

关键词

-

资金

  1. National Institute of General Medical Sciences of the National Institutes of Health [R01GM104977]
  2. National Research Initiative Competitive Grants Program from the USDA National Institute of Food and Agriculture [2008-35600-04691]
  3. Agriculture and Food Research Initiative Competitive Grants Program from the USDA National Institute of Food and Agriculture [2011-67019-30192]
  4. NIFA [579666, 2008-35600-04691, 582849, 2011-67019-30192] Funding Source: Federal RePORTER

向作者/读者索取更多资源

When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called length bias, will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据