4.7 Article

Novel ensemble feature selection techniques applied to high-grade gastroenteropancreatic neuroendocrine neoplasms for the prediction of survival

期刊

出版社

ELSEVIER IRELAND LTD
DOI: 10.1016/j.cmpb.2023.107934

关键词

Feature selection; Survival analysis; Expert-and data-driven model; Neuroendocrine neoplasia

向作者/读者索取更多资源

This study evaluates the application of modern ensemble feature selection techniques for predicting overall survival in patients with high-grade gastroenteropancreatic neuroendocrine neoplasms. The results demonstrate that these feature selectors provide accurate predictions and integrating expert knowledge can improve the stability of the feature set. WHO Performance Status, Albumin, Platelets, Ki-67, Tumor Morphology, Total MTV, Total TLG, and SUVmax are identified as the most stable and predictive features in this study.
Background and Objective: Determining the most informative features for predicting the overall survival of patients diagnosed with high-grade gastroenteropancreatic neuroendocrine neoplasms is crucial to improve individual treatment plans for patients, as well as the biological understanding of the disease. The main objective of this study is to evaluate the use of modern ensemble feature selection techniques for this purpose with respect to (a) quantitative performance measures such as predictive performance, (b) clinical interpretability, and (c) the effect of integrating prior expert knowledge.Methods: The Repeated Elastic Net Technique for Feature Selection (RENT) and the User-Guided Bayesian Framework for Feature Selection (UBayFS) are recently developed ensemble feature selectors investigated in this work. Both allow the user to identify informative features in datasets with low sample sizes and focus on model interpretability. While RENT is purely data-driven, UBayFS can integrate expert knowledge a priori in the feature selection process. In this work, we compare both feature selectors on a dataset comprising 63 patients and 110 features from multiple sources, including baseline patient characteristics, baseline blood values, tumor histology, imaging, and treatment information.Results: Our experiments involve data-driven and expert-driven setups, as well as combinations of both. In a five-fold cross-validated experiment without expert knowledge, our results demonstrate that both feature selectors allow accurate predictions: A reduction from 110 to approximately 20 features (around 82%) delivers near-optimal predictive performances with minor variations according to the choice of the feature selector, the predictive model, and the fold. Thereafter, we use findings from clinical literature as a source of expert knowledge. In addition, expert knowledge has a stabilizing effect on the feature set (an increase in stability of approximately 40%), while the impact on predictive performance is limited. Conclusions: The features WHO Performance Status, Albumin, Platelets, Ki-67, Tumor Morphology, Total MTV, Total TLG, and SUVmax are the most stable and predictive features in our study. Overall, this study demonstrated the practical value of feature selection in medical applications not only to improve quantitative performance but also to deliver potentially new insights to experts.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据