4.7 Article

Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy

Journal

SCIENTIFIC REPORTS
Volume 13, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41598-023-31921-0

Keywords

-

Ask authors/readers for more resources

This study evaluated the generalizability of AI algorithms using deep learning methods for identifying middle ear disease from otoscopic images. The study collected 1842 otoscopic images from three different sources and categorized them as normal or abnormal. The internal performance of the AI-otoscopy algorithms was high (mean AUC: 0.95), but the external performance on new test cohorts was lower (mean AUC: 0.76). Further efforts are needed to improve the external performance and develop a robust algorithm for real-world clinical applications.
To evaluate the generalizability of artificial intelligence (AI) algorithms that use deep learning methods to identify middle ear disease from otoscopic images, between internal to external performance. 1842 otoscopic images were collected from three independent sources: (a) Van, Turkey, (b) Santiago, Chile, and (c) Ohio, USA. Diagnostic categories consisted of (i) normal or (ii) abnormal. Deep learning methods were used to develop models to evaluate internal and external performance, using area under the curve (AUC) estimates. A pooled assessment was performed by combining all cohorts together with fivefold cross validation. AI-otoscopy algorithms achieved high internal performance (mean AUC: 0.95, 95%CI: 0.80-1.00). However, performance was reduced when tested on external otoscopic images not used for training (mean AUC: 0.76, 95%CI: 0.61-0.91). Overall, external performance was significantly lower than internal performance (mean difference in AUC: -0.19, p <= 0.04). Combining cohorts achieved a substantial pooled performance (AUC: 0.96, standard error: 0.01). Internally applied algorithms for otoscopy performed well to identify middle ear disease from otoscopy images. However, external performance was reduced when applied to new test cohorts. Further efforts are required to explore data augmentation and pre-processing techniques that might improve external performance and develop a robust, generalizable algorithm for real-world clinical applications.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available