4.7 Article

Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences

期刊

BMC GENOMICS
卷 23, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12864-022-08358-2

关键词

SARS-CoV-2; Genomic surveillance; Spike; Pango; Lineage

资金

  1. Oxford Martin School
  2. Wellcome Trust Hosts, Pathogens & Global Health Programme [203783/Z/16/Z]
  3. Fast Grants [2236]
  4. Wellcome Trust [203783/Z/16/Z] Funding Source: Wellcome Trust

向作者/读者索取更多资源

The study investigates how SARS-CoV-2 Pango lineages can be reliably designated using spike-only nucleotide sequences. While many lineages can be identified clearly with spike-only sequences, some sequences are shared among multiple lineages. The concept of lineage-sets is introduced to represent the range of Pango lineages consistent with observed mutations in spike sequences, providing a foundation for software tools to assign newly-generated sequences to lineage sets.
Background More than 2 million SARS-CoV-2 genome sequences have been generated and shared since the start of the COVID-19 pandemic and constitute a vital information source that informs outbreak control, disease surveillance, and public health policy. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. It is therefore important to understand how much information about Pango lineage status is contained in spike-only nucleotide sequences. Here we explore how Pango lineages might be reliably designated and assigned to spike-only nucleotide sequences. We survey the genetic diversity of such sequences, and investigate the information they contain about Pango lineage status. Results Although many lineages, including the main variants of concern, can be identified clearly using spike-only sequences, some spike-only sequences are shared among tens or hundreds of Pango lineages. To facilitate the classification of SARS-CoV-2 lineages using subgenomic sequences we introduce the notion of designating such sequences to a lineage set, which represents the range of Pango lineages that are consistent with the observed mutations in a given spike sequence. Conclusions We find that many lineages, including the main variants-of-concern, can be reliably identified by spike alone and we define lineage-sets to represent the lineage precision that can be achieved using spike-only nucleotide sequences. These data provide a foundation for the development of software tools that can assign newly-generated spike nucleotide sequences to Pango lineage sets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据