4.7 Article

A Study of Cell-Free DNA Fragmentation Pattern and Its Application in DNA Sample Type Classification

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2017.2723388

Keywords

Cell free DNA; liquid biopsy; fragmentation; pattern recognition

Funding

  1. National Science Foundation of China (NSFC) [61472411]
  2. Technology Development and Creative Design Program of Nanshan Shenzhen [KC2015JSJS0028A]
  3. Special Funds for Future Industries of Shenzhen [JSGG20160229123927512]

Ask authors/readers for more resources

Plasma cell-free DNA (cfDNA) has certain fragmentation patterns, which can bring non-random base content curves of the sequencing data's beginning cycles. We studied the patterns and found that we could determine whether a sample is cfDNA or not by just looking into the first 10 cycles of its base content curves. We analyzed 3,189 FastQ files, including 1,442 plasma cfDNA, 1,234 genomic DNA, 507 FFPE tumour DNA, and 6 urinary cfDNA. By deep analyzing these data, we found the patterns were stable enough to distinguish cfDNA from other kinds of DNA samples. Based on this finding, we built classification models to recognize cfDNA samples by their sequencing data. Pattern recognition models were then trained with different classification algorithms like k-nearest neighbors (KNN), random forest, and support vector machine (SVM). The result of 1,000 iteration .632+ bootstrapping showed that all these classifiers could give an average accuracy higher than 98 percent, indicating that the cfDNA patterns are unique and can make the dataset highly separable. The best result was obtained using a random forest classifier with a 99.89 percent average accuracy (sigma = 0.00068). A tool called CfdnaPattern (http://github.com/OpenGene/CfdnaPattern) has been developed to train the model and to predict whether a sample is cfDNA or not.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available