4.8 Article

Petabase-scale sequence alignment catalyses viral discovery

期刊

NATURE
卷 602, 期 7895, 页码 142-+

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41586-021-04332-2

关键词

-

资金

  1. University of British Columbia
  2. Max Plank Society
  3. Klaus Tschira Foundation
  4. ANR Transipedia, Inception and PRAIRIE grants [PIA/ANR16-CONV-0005, ANR-18CE45-0020, ANR-19-P3IA-0001]
  5. Ministerio de Economia y Competitividad of Spain
  6. FEDER [BFU2017-87370-P, PID2020-116008GB-I00]
  7. Russian Science Foundation [19-14-00172]
  8. Russian Science Foundation [19-14-00172] Funding Source: Russian Science Foundation

向作者/读者索取更多资源

Public databases contain a vast amount of nucleic acid sequences, but efficient methods for systematic exploration of this library have been lacking. In this study, we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. By searching millions of diverse samples, we identified over 10(5) novel RNA viruses and characterized their environmental reservoirs. To facilitate viral discovery, we established a comprehensive database of these data and tools. Expanding the known sequence diversity of viruses has important implications for understanding emerging pathogens and improving pathogen surveillance.
Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially(1). Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 10(5) novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据