☆ 4.7 Article

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

GENOME RESEARCH (2018)

期刊

GENOME RESEARCH

卷 28, 期 3, 页码 396-411

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT

DOI: 10.1101/gr.222976.117

关键词

类别

Biochemistry & Molecular Biology Biotechnology & Applied Microbiology Genetics & Heredity

资金

GENCODE NIH grant [2U41 HG007234]
University of Florida Preeminence hires program
Spanish Ministry of Economy and Competitiveness [BIO2015-71658-R, BFU2014-57636-P]
Spanish Ministry of Education grant [FPU2013/02348]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

期刊

GENOME RESEARCH

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

期刊

GENOME RESEARCH

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文