4.7 Article

Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing

期刊

JOURNAL OF PROTEOME RESEARCH
卷 -, 期 -, 页码 -

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jproteome.1c00968

关键词

alternative splicing; protein isoform; proteogenomics; MS/MS; RNA-seq; long read RNA sequencing; direct RNA-sequencing; Illumina; Oxford Nanopore Technology

资金

  1. Australian Research Training Program
  2. Australian Research Council [LE200100016]
  3. Australian Research Council [LE200100016] Funding Source: Australian Research Council

向作者/读者索取更多资源

This study investigated the use of long-read, nanopore-based, direct RNA sequencing for the identification of protein isoforms in human K562 cells. The results showed that this approach outperformed short-read RNA sequencing in identifying alternative splicing events and protein isoforms. The study also highlighted the benefits of using long-read RNA sequencing data in generating reference databases for the identification of protein isoforms.
Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据