4.4 Article

FDA-approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single-center study

期刊

NEURORADIOLOGY
卷 64, 期 5, 页码 981-990

出版社

SPRINGER
DOI: 10.1007/s00234-021-02874-w

关键词

Artificial intelligence; Deep learning; Intracranial hemorrhage; Computed tomography; Diagnostic accuracy

向作者/读者索取更多资源

An FDA-approved and CE-certified deep learning software application was found to be less accurate than a resident in detecting intracranial hemorrhages. The importance of thoughtful workflow integration and post-approval validation of AI applications in various clinical environments was highlighted.
Purpose To assess an FDA-approved and CE-certified deep learning (DL) software application compared to the performance of human radiologists in detecting intracranial hemorrhages (ICH). Methods Within a 20-week trial from January to May 2020, 2210 adult non-contrast head CT scans were performed in a single center and automatically analyzed by an artificial intelligence (AI) solution with workflow integration. After excluding 22 scans due to severe motion artifacts, images were retrospectively assessed for the presence of ICHs by a second-year resident and a certified radiologist under simulated time pressure. Disagreements were resolved by a subspecialized neuroradiologist serving as the reference standard. We calculated interrater agreement and diagnostic performance parameters, including the Breslow-Day and Cochran-Mantel-Haenszel tests. Results An ICH was present in 214 out of 2188 scans. The interrater agreement between the resident and the certified radiologist was very high (kappa = 0.89) and even higher (kappa = 0.93) between the resident and the reference standard. The software has delivered 64 false-positive and 68 false-negative results giving an overall sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of 68.2%, 96.8%, 69.5%, 96.6%, and 94.0%, respectively. Corresponding values for the resident were 94.9%, 99.2%, 93.1%, 99.4%, and 98.8%. The accuracy of the DL application was inferior (p < 0.001) to that of both the resident and the certified neuroradiologist. Conclusion A resident under time pressure outperformed an FDA-approved DL program in detecting ICH in CT scans. Our results underline the importance of thoughtful workflow integration and post-approval validation of AI applications in various clinical environments.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据