4.8 Article

FragGeneScan: predicting genes in short and error-prone reads

期刊

NUCLEIC ACIDS RESEARCH
卷 38, 期 20, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkq747

关键词

-

资金

  1. National Institutes of Health [1R01HG004908-02]
  2. National Science Foundation [DBI-0845685]
  3. Div Of Biological Infrastructure
  4. Direct For Biological Sciences [0845685] Funding Source: National Science Foundation

向作者/读者索取更多资源

The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not available. Gene predictors developed for whole genomes (e.g. Glimmer) and recently developed for metagenomic sequences (e.g. MetaGene) show a significant decrease in performance as the sequencing error rates increase, or as reads get shorter. We have developed a novel gene prediction method FragGeneScan, which combines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads. The performance of FragGeneScan was comparable to Glimmer and MetaGene for complete genomes. But for short reads, FragGeneScan consistently outperformed MetaGene (accuracy improved similar to 62% for reads of 400 bases with 1% sequencing errors, and similar to 18% for short reads of 100 bases that are error free). When applied to metagenomes, FragGeneScan recovered substantially more genes than MetaGene predicted (> 90% of the genes identified by homology search), and many novel genes with no homologs in current protein sequence database.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据