4.6 Article

OtoXNet-automated identification of eardrum diseases from otoscope videos: a deep learning study for video-representing images

Journal

NEURAL COMPUTING & APPLICATIONS
Volume 34, Issue 14, Pages 12197-12210

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s00521-022-07107-6

Keywords

Eardrum abnormalities; Computer-assisted diagnosis; Convolutional neural networks; Otoscope; Video classification

Funding

  1. National Institute on Deafness and Other Communication Disorders [R21 DC016972]

Ask authors/readers for more resources

The lack of objective evaluation methods for the eardrum is a critical barrier to accurate diagnosis. This paper proposes a novel deep learning-based method called OtoXNet, which automatically learns features for eardrum classification from otoscope video clips. By utilizing multiple composite image generation methods, OtoXNet proves to outperform baseline approaches in qualitative results, showing the advantage of using multiple composite images in analyzing eardrum abnormalities.
The lack of an objective method to evaluate the eardrum is a critical barrier to an accurate diagnosis. Eardrum images are classified into normal or abnormal categories with machine learning techniques. If the input is an otoscopy video, a traditional approach requires great effort and expertise to manually determine the representative frame(s). In this paper, we propose a novel deep learning-based method, called OtoXNet, which automatically learns features for eardrum classification from otoscope video clips. We utilized multiple composite image generation methods to construct a highly representative version of otoscopy videos to diagnose three major eardrum diseases, i.e., otitis media with effusion, eardrum perforation, and tympanosclerosis versus normal (healthy). We compared the performance of OtoXNet against methods that either use a single composite image or a keyframe selected by an experienced human. Our dataset consists of 394 otoscopy videos from 312 patients and 765 composite images before augmentation. OtoXNet with multiple composite images achieved 84.8% class-weighted accuracy with 3.8% standard deviation, whereas with the human-selected keyframes and single composite images, the accuracies were respectively, 81.8% +/- 5.0% and 80.1% +/- 4.8% on multi-class eardrum video classification task using an eightfold cross-validation scheme. A paired t-test shows that there is a statistically significant difference (p-value of 1.3 x 10(-2)) between the performance values of OtoXNet (multiple composite images) and the human-selected keyframes. Contrarily, the difference in means of keyframe and single composites was not significant (p = 5.49 x 10(-1)). OtoXNet surpasses the baseline approaches in qualitative results. The use of multiple composite images in analyzing eardrum abnormalities is advantageous compared to using single composite images or manual keyframe selection.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available