4.7 Article

Three-nucleotide periodicity of nucleotide diversity in a population enables the identification of open reading frames

期刊

BRIEFINGS IN BIOINFORMATICS
卷 23, 期 4, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbac210

关键词

open reading frame; sORF; SNPs; population; polyploidy genome

资金

  1. National Key Research and Development Program of China [2019YFA0707000]
  2. Agricultural Science and Technology Innovation Program of CAAS [CAAS-GXAAS-XTCX2019026-1]
  3. Guangdong Innovation Research Team Fund [2014ZT05S078]

向作者/读者索取更多资源

Accurate prediction of ORFs is essential for studying and utilizing genome sequences. This study introduces a novel approach utilizing nucleotide periodicity in populational genomic variants to predict ORFs and develops the OrfPP software package for validation. The method demonstrates high accuracy and reliability in various complex genomes.
Accurate prediction of open reading frames (ORFs) is important for studying and using genome sequences. Ribosomes move along mRNA strands with a step of three nucleotides and datasets carrying this information can be used to predict ORFs. The ribosome-protected footprints (RPFs) feature a significant 3-nt periodicity on mRNAs and are powerful in predicting translating ORFs, including small ORFs (sORFs), but the application of RPFs is limited because they are too short to be accurately mapped in complex genomes. In this study, we found a significant 3-nt periodicity in the datasets of populational genomic variants in coding sequences, in which the nucleotide diversity increases every three nucleotides. We suggest that this feature can be used to predict ORFs and develop the Python package 'OrfPP', which recovers similar to 83% of the annotated ORFs in the tested genomes on average, independent of the population sizes and the complexity of the genomes. The novel ORFs, including sORFs, identified from single-nucleotide polymorphisms are supported by protein mass spectrometry evidence comparable to that of the annotated ORFs. The application of OrfPP to tetraploid cotton and hexaploid wheat genomes successfully identified 76.17% and 87.43% of the annotated ORFs in the genomes, respectively, as well as 4704 sORFs, including 1182 upstream and 2110 downstream ORFs in cotton and 5025 sORFs, including 232 upstream and 234 downstream ORFs in wheat. Overall, we propose an alternative and supplementary approach for ORF prediction that can extend the studies of sORFs to more complex genomes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据