☆ 4.5 Article

Pairagon plus N-SCAN_EST: a model-based gene annotation pipeline

GENOME BIOLOGY (2006)

期刊

GENOME BIOLOGY

卷 7, 期 -, 页码 -

出版社

BMC

DOI: 10.1186/gb-2006-7-s1-s5

关键词

类别

Biotechnology & Applied Microbiology Genetics & Heredity

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background: This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci of putative homologs. Trans alignments contain a high proportion of mismatches, gaps, and/or apparently unspliceable introns, compared to alignments of cDNA sequences to their native loci. The Pairagon+ N-SCAN_EST pipeline's first stage is Pairagon, a cDNA-to-genome alignment program based on a PairHMM probability model. This model relies on prior knowledge, such as the fact that introns must begin with GT, GC, or AT and end with AG or AC. It produces very precise alignments of high quality cDNA sequences. In the genomic regions between Pairagon's cDNA alignments, the pipeline combines EST alignments with de novo gene prediction by using N-SCAN_ST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST alignments. Because they are based on probability models, both Pairagon and N-SCAN_EST can be trained automatically for new genomes and data sets. Results: On the ENCODE regions of the human genome, Pairagon+ N- SCAN_EST was as accurate as any other system tested in the EGASP assessment, including ENSEMBL and ExoGean. Conclusions: With sufficient mRNA/EST evidence, genome annotation without trans alignments can compete successfully with systems like ENSEMBL and ExoGean, which use trans alignments.

Pairagon plus N-SCAN_EST: a model-based gene annotation pipeline

期刊

GENOME BIOLOGY

出版社

BMC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Pairagon plus N-SCAN_EST: a model-based gene annotation pipeline

期刊

GENOME BIOLOGY

出版社

BMC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文