Journal
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
Volume 15, Issue 5, Pages 1718-1722Publisher
IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2017.2723388
Keywords
Cell free DNA; liquid biopsy; fragmentation; pattern recognition
Categories
Funding
- National Science Foundation of China (NSFC) [61472411]
- Technology Development and Creative Design Program of Nanshan Shenzhen [KC2015JSJS0028A]
- Special Funds for Future Industries of Shenzhen [JSGG20160229123927512]
Ask authors/readers for more resources
Plasma cell-free DNA (cfDNA) has certain fragmentation patterns, which can bring non-random base content curves of the sequencing data's beginning cycles. We studied the patterns and found that we could determine whether a sample is cfDNA or not by just looking into the first 10 cycles of its base content curves. We analyzed 3,189 FastQ files, including 1,442 plasma cfDNA, 1,234 genomic DNA, 507 FFPE tumour DNA, and 6 urinary cfDNA. By deep analyzing these data, we found the patterns were stable enough to distinguish cfDNA from other kinds of DNA samples. Based on this finding, we built classification models to recognize cfDNA samples by their sequencing data. Pattern recognition models were then trained with different classification algorithms like k-nearest neighbors (KNN), random forest, and support vector machine (SVM). The result of 1,000 iteration .632+ bootstrapping showed that all these classifiers could give an average accuracy higher than 98 percent, indicating that the cfDNA patterns are unique and can make the dataset highly separable. The best result was obtained using a random forest classifier with a 99.89 percent average accuracy (sigma = 0.00068). A tool called CfdnaPattern (http://github.com/OpenGene/CfdnaPattern) has been developed to train the model and to predict whether a sample is cfDNA or not.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available