Journal
NATURE COMMUNICATIONS
Volume 13, Issue 1, Pages -Publisher
NATURE PORTFOLIO
DOI: 10.1038/s41467-022-31666-w
Keywords
-
Categories
Funding
- Stichting Hanarth Fonds, The Netherlands
Ask authors/readers for more resources
This study demonstrates the use of whole-genome DNA sequencing and a machine learning model called Cancer of Unknown Primary Location Resolver to classify metastatic tumors, improving diagnosis and treatment decision-making.
Cancers of unknown primary (CUP) origin account for similar to 3% of all cancer diagnoses, whereby the tumor tissue of origin (TOO) cannot be determined. Using a uniformly processed dataset encompassing 6756 whole-genome sequenced primary and metastatic tumors, we develop Cancer of Unknown Primary Location Resolver (CUPLR), a random forest TOO classifier that employs 511 features based on simple and complex somatic driver and passenger mutations. CUPLR distinguishes 35 cancer (sub)types with similar to 90% recall and similar to 90% precision based on cross-validation and test set predictions. We find that structural variant derived features increase the performance and utility for classifying specific cancer types. With CUPLR, we could determine the TOO for 82/141 (58%) of CUP patients. Although CUPLR is based on machine learning, it provides a human interpretable graphical report with detailed feature explanations. The comprehensive output of CUPLR complements existing histopathological procedures and can enable improved diagnostics for CUP patients. The original tumor location can be unclear for metastatic tumors. Here, the authors show that DNA sequencing of whole genomes can be used to classify metastatic tumors using a machine learning model, Cancer of Unknown Primary Location Resolver, in order to improve diagnosis and inform treatment decisions.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available