4.7 Article

External validation of deep learning-based bone-age software: a preliminary study with real world data

期刊

SCIENTIFIC REPORTS
卷 12, 期 1, 页码 -

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41598-022-05282-z

关键词

-

资金

  1. MSIT (Ministry of Science and ICT), Korea, under the ICT Creative Consilience program [IITP-2021-0-01819]

向作者/读者索取更多资源

This study evaluated the clinical performance of a commercially available deep learning software for bone-age assessment using real-world data. The software showed good correlation with the estimates of three reviewers and performed similarly or even better than them in comparison to chronological age.
Artificial intelligence (AI) is increasingly being used in bone-age (BA) assessment due to its complicated and lengthy nature. We aimed to evaluate the clinical performance of a commercially available deep learning (DL)-based software for BA assessment using a real-world data. From Nov. 2018 to Feb. 2019, 474 children (35 boys, 439 girls, age 4-17 years) were enrolled. We compared the BA estimated by DL software (DL-BA) with that independently estimated by 3 reviewers (R1: Musculoskeletal radiologist, R2: Radiology resident, R3: Pediatric endocrinologist) using the traditional Greulich-Pyle atlas, then to his/her chronological age (CA). A paired t-test, Pearson's correlation coefficient, Bland-Altman plot, mean absolute error (MAE) and root mean square error (RMSE) were used for the statistical analysis. The intraclass correlation coefficient (ICC) was used for inter-rater variation. There were significant differences between DL-BA and each reviewer's BA (P < 0.025), but the correlation was good with one another (r = 0.983, P < 0.025). RMSE (MAE) values were 10.09 (7.21), 10.76 (7.88) and 13.06 (10.06) months between DL-BA and R1, R2, R3 BA. Compared with the CA, RMSE (MAE) values were 13.54 (11.06), 15.18 (12.11), 16.19 (12.78) and 19.53 (17.71) months for DL-BA, R1, R2, R3 BA, respectively. Bland-Altman plots revealed the software and reviewers' tendency to overestimate the BA in general. ICC values between 3 reviewers were 0.97, 0.85 and 0.86, and the overall ICC value was 0.93. The BA estimated by DL-based software showed statistically similar, or even better performance than that of reviewers' compared to the chronological age in the real world clinic.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据