4.7 Review

Data structures based on k-mers for querying large collections of sequencing data sets

期刊

GENOME RESEARCH
卷 31, 期 1, 页码 1-12

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
DOI: 10.1101/gr.260604.119

关键词

-

资金

  1. ANR Transipedia [ANR-18-CE45-0020]
  2. ANR INCEPTION [PIA/ANR-16-CONV-0005]
  3. National Science Foundation [1453527, 1439057, 1618814]
  4. National Institutes of Health, National Institute of Allergy and Infectious Diseases [R01AI141810-01]
  5. Division of Computing and Communication Foundations
  6. Direct For Computer & Info Scie & Enginr [1439057] Funding Source: National Science Foundation
  7. Div Of Information & Intelligent Systems
  8. Direct For Computer & Info Scie & Enginr [1618814] Funding Source: National Science Foundation
  9. Div Of Information & Intelligent Systems
  10. Direct For Computer & Info Scie & Enginr [1453527] Funding Source: National Science Foundation
  11. Agence Nationale de la Recherche (ANR) [ANR-18-CE45-0020] Funding Source: Agence Nationale de la Recherche (ANR)

向作者/读者索取更多资源

High-throughput sequencing data sets are deposited in public repositories for reproducibility, but limitations exist in performing online sequence searches due to the large data size. In recent years, computational approaches based on representing data sets as sets of k-mers have been introduced to address this issue, each with its own performance and limitations.
High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据