4.6 Article

AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data

期刊

FRONTIERS IN GENETICS
卷 13, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2022.1020100

关键词

workflow; proteogenomics; genome annotation; functional annotation; hypothetical proteins

资金

  1. CAPES (Coordination for the Improvement of Higher Education Personnel, Brazil)
  2. CNPq (National Council for Scientific and Technological Development, Brazil) [001]
  3. FAPESC (Santa Catarina Research Foundation)
  4. UFSC (Federal University of Santa Catarina)
  5. CAPES
  6. CNPq
  7. FAPESC

向作者/读者索取更多资源

Assignment of gene function is crucial in genomics, but manual annotation is no longer feasible due to the increasing amounts of data. An integrated pipeline called AnnotaPipeline is introduced here, which uses experimental data to validate in silico predictions of gene function. The pipeline integrates different software and data types to annotate and validate predicted features in genomic sequences.
Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: .

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据