4.8 Article

Significant non-existence of sequences in genomes and proteomes

期刊

NUCLEIC ACIDS RESEARCH
卷 49, 期 6, 页码 3139-3155

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkab139

关键词

-

资金

  1. Department of Information Technology and Human Factors, National Institute of Advanced Industrial Science and Technology
  2. National Institute of Advanced Industrial Science and Technology

向作者/读者索取更多资源

The study reveals significant absent oligomers using Markov models and suggests their absence is due to negative selection. Common significant MAWs are often mono- or dinucleotide tracts. MAWs in mammal genomes are often present but rare in other mammals, while human MAWs are frequently present in prokaryotes but rarely in human viruses.
Minimal absent words (MAWs) are minimal-length oligomers absent from a genome or proteome. Although some artificially synthesized MAWs have deleterious effects, there is still a lack of a strategy for the classification of non-occurring sequences as potentially malicious or benign. In this work, by using Markovian models with multiple-testing correction, we reveal significant absent oligomers, which are statistically expected to exist. This suggests that their absence is due to negative selection. We survey genomes and proteomes covering the diversity of life and find thousands of significant absent sequences. Common significant MAWs are often mono- or dinucleotide tracts, or palindromic. Significant viral MAWs are often restriction sites and may indicate unknown restriction motifs. Surprisingly, significant mammal genome MAWs are often present, but rare, in other mammals, suggesting that they are suppressed but not completely forbidden. Significant human MAWs are frequently present in prokaryotes, suggesting immune function, but rarely present in human viruses, indicating viral mimicry of the host. More than one-fourth of human proteins are one substitution away from containing a significant MAW, with the majority of replacements being predicted harmful. We provide a web-based, interactive database of significant MAWs across genomes and proteomes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据