4.7 Article

Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models

期刊

出版社

ELSEVIER IRELAND LTD
DOI: 10.1016/j.cmpb.2021.106504

关键词

Deep learning; Machine learning; Data set size; Interactions

资金

  1. Association Nationale de la Recherche et de la Technologie (ANRT) [2019/1373]
  2. Service de Biostatistique-Bioinformatique des Hospices Civils de Lyon

向作者/读者索取更多资源

This study compares the impact of training dataset size and interactions on the performance of machine learning and deep learning models. The results show that machine learning models are less influenced by dataset size but require interaction terms to achieve good performance, while deep learning models can achieve good performance even without interaction terms. Overall, well-specified machine learning models outperform deep learning models.
Background and objective: Machine learning and deep learning models are very powerful in predicting the presence of a disease. To achieve good predictions, those models require a certain amount of data to train on, whereas this amount i) is generally limited and difficult to obtain; and, ii) increases with the complexity of the interactions between the outcome (disease presence) and the model variables. This study compares the ways training dataset size and interactions affect the performance of those prediction models. Methods: To compare the two influences, several datasets were simulated that differed in the number of observations and the complexity of the interactions between the variables and the outcome. A few logistic regressions and neural networks were trained on the simulated datasets and their performance evaluated by cross-validation and compared using accuracy, F1 score, and AUC metrics. Results: Models trained on simulated datasets without interactions provided good results: AUC close to 0.80 with either logistic regression or neural networks. Models trained on simulated dataset with order 2 interactions led also to AUCs close to 0.80 with either logistic regression or neural networks. Models trained on simulated datasets with order 4 interactions led to AUC close to 0.80 with neural networks and 0.85 with penalized logistic regressions. Whatever the interaction order, increasing the dataset size did not significantly affect model performance, especially that of machine learning models. Conclusion: Machine learning models were the less influenced by the dataset size but needed interaction terms to achieve good performance, whereas deep learning models could achieve good performance without interaction terms. Conclusively, with the considered scenarios, well-specified machine learning models outperformed deep learning models.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据