期刊
GENOME RESEARCH
卷 31, 期 1, 页码 1-12出版社
COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
DOI: 10.1101/gr.260604.119
关键词
-
资金
- ANR Transipedia [ANR-18-CE45-0020]
- ANR INCEPTION [PIA/ANR-16-CONV-0005]
- National Science Foundation [1453527, 1439057, 1618814]
- National Institutes of Health, National Institute of Allergy and Infectious Diseases [R01AI141810-01]
- Division of Computing and Communication Foundations
- Direct For Computer & Info Scie & Enginr [1439057] Funding Source: National Science Foundation
- Div Of Information & Intelligent Systems
- Direct For Computer & Info Scie & Enginr [1618814] Funding Source: National Science Foundation
- Div Of Information & Intelligent Systems
- Direct For Computer & Info Scie & Enginr [1453527] Funding Source: National Science Foundation
- Agence Nationale de la Recherche (ANR) [ANR-18-CE45-0020] Funding Source: Agence Nationale de la Recherche (ANR)
High-throughput sequencing data sets are deposited in public repositories for reproducibility, but limitations exist in performing online sequence searches due to the large data size. In recent years, computational approaches based on representing data sets as sets of k-mers have been introduced to address this issue, each with its own performance and limitations.
High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据