4.6 Article

Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction

Journal

FRONTIERS IN ONCOLOGY
Volume 13, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA
DOI: 10.3389/fonc.2023.1156009

Keywords

ionizing radiation; radionuclides; absorbed dose; biomarkers; transcriptomics; kNN (k nearest neighbor); caret

Categories

Ask authors/readers for more resources

This study developed a resource-efficient machine learning framework for the discovery of radiation biomarkers using gene expression data from irradiated normal tissues. By using a KNN-based approach, novel potential radiation biomarkers were identified and showed good performance in dose and tissue separation as well as dose regression.
BackgroundMolecular radiation biomarkers are an emerging tool in radiation research with applications for cancer radiotherapy, radiation risk assessment, and even human space travel. However, biomarker screening in genome-wide expression datasets using conventional tools is time-consuming and underlies analyst (human) bias. Machine Learning (ML) methods can improve the sensitivity and specificity of biomarker identification, increase analytical speed, and avoid multicollinearity and human bias. AimTo develop a resource-efficient ML framework for radiation biomarker discovery using gene expression data from irradiated normal tissues. Further, to identify biomarker panels predicting radiation dose with tissue specificity. MethodsA strategic search in the Gene Expression Omnibus database identified a transcriptomic dataset (GSE44762) for normal tissues radiation responses (murine kidney cortex and medulla) suited for biomarker discovery using an ML approach. The dataset was pre-processed in R and separated into train and test data subsets. High computational cost of Genetic Algorithm/k-Nearest Neighbor (GA/KNN) mandated optimization and 13 ML models were tested using the caret package in R. Biomarker performance was evaluated and visualized via Principal Component Analysis (PCA) and dose regression. The novelty of ML-identified biomarker panels was evaluated by literature search. ResultsCaret-based feature selection and ML methods vastly improved processing time over the GA approach. The KNN method yielded overall best performance values on train and test data and was implemented into the framework. The top-ranking genes were Cdkn1a, Gria3, Mdm2 and Plk2 in cortex, and Brf2, Ccng1, Cdkn1a, Ddit4l, and Gria3 in medulla. These candidates successfully categorized dose groups and tissues in PCA. Regression analysis showed that correlation between predicted and true dose was high with R-2 of 0.97 and 0.99 for cortex and medulla, respectively. ConclusionThe caret framework is a powerful tool for radiation biomarker discovery optimizing performance with resource-efficiency for broad implementation in the field. The KNN-based approach identified Brf2, Ddit4l, and Gria3 mRNA as novel candidates that have been uncharacterized as radiation biomarkers to date. The biomarker panel showed good performance in dose and tissue separation and dose regression. Further training with larger cohorts is warranted to improve accuracy, especially for lower doses.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available