4.7 Article

Critical evaluation of deep neural networks for wrist fracture detection

期刊

SCIENTIFIC REPORTS
卷 11, 期 1, 页码 -

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41598-021-85570-2

关键词

-

资金

  1. Research Unit of Medical Imaging, Physics and Technology, University of Oulu

向作者/读者索取更多资源

The study revealed that while the Deep Learning model for wrist fracture detection performs near perfectly on a general test set, it shows lower performance on a challenging test set requiring confirmation by CT imaging. This underscores the importance of meticulous analysis of DL-based models before clinical implementation and the necessity for more challenging settings in testing medical AI systems.
Wrist Fracture is the most common type of fracture with a high incidence rate. Conventional radiography (i.e. X-ray imaging) is used for wrist fracture detection routinely, but occasionally fracture delineation poses issues and an additional confirmation by computed tomography (CT) is needed for diagnosis. Recent advances in the field of Deep Learning (DL), a subfield of Artificial Intelligence (AI), have shown that wrist fracture detection can be automated using Convolutional Neural Networks. However, previous studies did not pay close attention to the difficult cases which can only be confirmed via CT imaging. In this study, we have developed and analyzed a state-of-the-art DL-based pipeline for wrist (distal radius) fracture detection-DeepWrist, and evaluated it against one general population test set, and one challenging test set comprising only cases requiring confirmation by CT. Our results reveal that a typical state-of-the-art approach, such as DeepWrist, while having a near-perfect performance on the general independent test set, has a substantially lower performance on the challenging test set-average precision of 0.99 (0.99-0.99) versus 0.64 (0.46-0.83), respectively. Similarly, the area under the ROC curve was of 0.99 (0.98-0.99) versus 0.84 (0.72-0.93), respectively. Our findings highlight the importance of a meticulous analysis of DL-based models before clinical use, and unearth the need for more challenging settings for testing medical AI systems.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据