4.7 Article

Measuring reproducibility of virus metagenomics analyses using bootstrap samples from FASTQ-files

期刊

BIOINFORMATICS
卷 37, 期 8, 页码 1068-1075

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btaa926

关键词

-

资金

  1. Deutsche Forschungsgemeinschafft (DFG, German Research Foundation) [398066876/GRK 2485/1]

向作者/读者索取更多资源

The study introduces a resampling approach to evaluate the reliability of high-throughput sequencing data, with applications in virus metagenomics. Results show that the method exhibits high reproducibility in uncovering viruses in sequencing data, as well as judging the evidence of virus presence.
Motivation: High-throughput sequencing data can be affected by different technical errors, e.g. from probe preparation or false base calling. As a consequence, reproducibility of experiments can be weakened. In virus metagenomics, technical errors can result in falsely identified viruses in samples from infected hosts. We present a new resampling approach based on bootstrap sampling of sequencing reads from FASTQ-files in order to generate artificial replicates of sequencing runs which can help to judge the robustness of an analysis. In addition, we evaluate a mixture model on the distribution of read counts per virus to identify potentially false positive findings. Results: The evaluation of our approach on an artificially generated dataset with known viral sequence content shows in general a high reproducibility of uncovering viruses in sequencing data, i.e. the correlation between original and mean bootstrap read count was highly correlated. However, the bootstrap read counts can also indicate reduced or increased evidence for the presence of a virus in the biological sample. We also found that the mixture-model fits well to the read counts, and furthermore, it provides a higher accuracy on the original or on the bootstrap read counts than on the difference between both. The usefulness of our methods is further demonstrated on two freely available real-world datasets from harbor seals.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据