4.8 Article

Machine learning-based tissue of origin classification for cancer of unknown primary diagnostics using genome-wide mutation features

Journal

NATURE COMMUNICATIONS
Volume 13, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41467-022-31666-w

Keywords

-

Funding

  1. Stichting Hanarth Fonds, The Netherlands

Ask authors/readers for more resources

This study demonstrates the use of whole-genome DNA sequencing and a machine learning model called Cancer of Unknown Primary Location Resolver to classify metastatic tumors, improving diagnosis and treatment decision-making.
Cancers of unknown primary (CUP) origin account for similar to 3% of all cancer diagnoses, whereby the tumor tissue of origin (TOO) cannot be determined. Using a uniformly processed dataset encompassing 6756 whole-genome sequenced primary and metastatic tumors, we develop Cancer of Unknown Primary Location Resolver (CUPLR), a random forest TOO classifier that employs 511 features based on simple and complex somatic driver and passenger mutations. CUPLR distinguishes 35 cancer (sub)types with similar to 90% recall and similar to 90% precision based on cross-validation and test set predictions. We find that structural variant derived features increase the performance and utility for classifying specific cancer types. With CUPLR, we could determine the TOO for 82/141 (58%) of CUP patients. Although CUPLR is based on machine learning, it provides a human interpretable graphical report with detailed feature explanations. The comprehensive output of CUPLR complements existing histopathological procedures and can enable improved diagnostics for CUP patients. The original tumor location can be unclear for metastatic tumors. Here, the authors show that DNA sequencing of whole genomes can be used to classify metastatic tumors using a machine learning model, Cancer of Unknown Primary Location Resolver, in order to improve diagnosis and inform treatment decisions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available