4.6 Article

Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction

期刊

FRONTIERS IN ONCOLOGY
卷 13, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fonc.2023.1156009

关键词

ionizing radiation; radionuclides; absorbed dose; biomarkers; transcriptomics; kNN (k nearest neighbor); caret

类别

向作者/读者索取更多资源

This study developed a resource-efficient machine learning framework for the discovery of radiation biomarkers using gene expression data from irradiated normal tissues. By using a KNN-based approach, novel potential radiation biomarkers were identified and showed good performance in dose and tissue separation as well as dose regression.
BackgroundMolecular radiation biomarkers are an emerging tool in radiation research with applications for cancer radiotherapy, radiation risk assessment, and even human space travel. However, biomarker screening in genome-wide expression datasets using conventional tools is time-consuming and underlies analyst (human) bias. Machine Learning (ML) methods can improve the sensitivity and specificity of biomarker identification, increase analytical speed, and avoid multicollinearity and human bias. AimTo develop a resource-efficient ML framework for radiation biomarker discovery using gene expression data from irradiated normal tissues. Further, to identify biomarker panels predicting radiation dose with tissue specificity. MethodsA strategic search in the Gene Expression Omnibus database identified a transcriptomic dataset (GSE44762) for normal tissues radiation responses (murine kidney cortex and medulla) suited for biomarker discovery using an ML approach. The dataset was pre-processed in R and separated into train and test data subsets. High computational cost of Genetic Algorithm/k-Nearest Neighbor (GA/KNN) mandated optimization and 13 ML models were tested using the caret package in R. Biomarker performance was evaluated and visualized via Principal Component Analysis (PCA) and dose regression. The novelty of ML-identified biomarker panels was evaluated by literature search. ResultsCaret-based feature selection and ML methods vastly improved processing time over the GA approach. The KNN method yielded overall best performance values on train and test data and was implemented into the framework. The top-ranking genes were Cdkn1a, Gria3, Mdm2 and Plk2 in cortex, and Brf2, Ccng1, Cdkn1a, Ddit4l, and Gria3 in medulla. These candidates successfully categorized dose groups and tissues in PCA. Regression analysis showed that correlation between predicted and true dose was high with R-2 of 0.97 and 0.99 for cortex and medulla, respectively. ConclusionThe caret framework is a powerful tool for radiation biomarker discovery optimizing performance with resource-efficiency for broad implementation in the field. The KNN-based approach identified Brf2, Ddit4l, and Gria3 mRNA as novel candidates that have been uncharacterized as radiation biomarkers to date. The biomarker panel showed good performance in dose and tissue separation and dose regression. Further training with larger cohorts is warranted to improve accuracy, especially for lower doses.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据