4.7 Article

EPGA-SC : A Framework for de novo Assembly of Single-Cell Sequencing Reads

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2019.2945761

关键词

Sequential analysis; Genomics; Tools; Error correction; DNA; Technological innovation; Bioinformatics; Single-cell sequencing data; sequencing biases; repetitive regions; chimeric errors; assembly

资金

  1. National Natural Science Foundation of China [61732009, 61772557]
  2. Hunan Provincial Science and technology Program [2018wk4001]
  3. 111 Project

向作者/读者索取更多资源

This study introduces a new framework, EPGA-SC, for de novo assembly of single-cell sequencing data, overcoming challenges such as sequencing errors, biases, and repetitive regions. By classifying reads, using high precision paired-end reads from other assemblers, and developing novel algorithms for error removal and contig extension, EPGA-SC outperforms most current tools in terms of MAX contig, N50, NG50, NA50, and NGA50.
Assembling genomes from single-cell sequencing data is essential for single-cell studies. However, single-cell assemblies are challenging due to (i) the highly non-uniform read coverage and (ii) the elevated levels of sequencing errors and chimeric reads. Although several assemblers for single-cell data have been proposed in recent years, most of them fail to construct correct long contigs. In this study, we present a new framework called EPGA-SC for de novo assembly of single-cell sequencing reads. The EPGA assembler has designed strategies to solve the problems caused by sequencing errors, sequencing biases, and repetitive regions. However, the extremely unbalanced and richer error types prevent EPGA to achieve high performance in single-cell sequencing data. In this study, we designed EPGA-SC based on EPGA. The main innovations of EPGA-SC are as follows: (i) classifying reads to reduce the proportion of false reads; (ii) using multiple sets of high precision paired-end reads generated from the high precision assemblies produced by other assembler such as SPAdes to overcome the impact of sequencing biases and repetitive regions; and (iii) developing novel algorithms for removing chimeric errors and extending contigs. We test EPGA-SC with seven datasets. The experimental results show that EPGA-SC can generate better assemblies than most current tools in most time in term of MAX contig, N50, NG50, NA50, and NGA50.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据