4.7 Article

Using networks to analyze and visualize the distribution of overlapping genes in virus genomes

期刊

PLOS PATHOGENS
卷 18, 期 2, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.ppat.1010331

关键词

-

资金

  1. Natural Sciences and Engineering Research Council of Canada [05516-2018 RGPIN]

向作者/读者索取更多资源

This study conducts a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NOBI database. Results show that the number of OvRFs increases with genome length, but they tend to be shorter in longer genomes. Majority of overlaps involve +2 frameshifts and antisense overlaps tend to be longer. The study also develops a new graph-based representation to visualize the distribution of overlaps among genomes in a virus family.
Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NOBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (-0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据