☆ 4.6 Article Proceedings Paper

Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants

BMC BIOINFORMATICS (2018)

期刊

BMC BIOINFORMATICS

卷 19, 期 -, 页码 -

出版社

BMC

DOI: 10.1186/s12859-018-2333-9

关键词

Similarity search; Similarity join; K-mer; Filtering; Edit distance; Hamming distance

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Mathematical & Computational Biology

资金

IH grant [1R01EB025022-01]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naieve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. Results: In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj Conclusion: The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data.

Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants

期刊

BMC BIOINFORMATICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants

期刊

BMC BIOINFORMATICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文