☆ 4.3 Article

Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images

JOURNAL OF VOICE (2022)

期刊

JOURNAL OF VOICE

卷 36, 期 5, 页码 590-598

出版社

MOSBY-ELSEVIER

DOI: 10.1016/j?voice.2020.08.003

关键词

Computer; Computer-assisted; Deep learning; Diagnosis; Laryngoscopic images; Neural networks; Vocal cords

类别

Audiology & Speech-Language Pathology Otorhinolaryngology

资金

Ministry of Trade, Industry & Energy (MOTIE, Korea) under Indus-trial Technology Innovation Program [20000843]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Deep learning using CNNs can accurately determine vocal fold normality in laryngoscopic images, with VGG16 and Inception V3 models outperforming simpler models like CNN6 and Xception. The study demonstrates the potential clinical applications of deep learning in laryngoscopy through real-time classification using a combination of VGG16 model, OpenCV, and Grad-CAM on video streams.

Objectives. Deep learning using convolutional neural networks (CNNs) is widely used in medical imaging research. This study was performed to investigate if vocal fold normality in laryngoscopic images can be determined by CNN-based deep learning and to compare accuracy of CNN models and explore the feasibility of application of deep learning on laryngoscopy.Methods. Laryngoscopy videos were screen-captured and each image was cropped to include abducted vocal fold regions. A total of 2216 image (899 normal, 1317 abnormal) were allocated to training, validation, and test sets. Augmentation of training sets was used to train a constructed CNN model with six layers (CNN6), VGG16, Inception V3, and Xception models. Trained models were applied to the test set; for each model, receiver operat-ing characteristic curves and cutoff values were obtained. Sensitivity, specificity, positive predictive value, nega-tive predictive value, and accuracy were calculated. The best model was employed in video-streams and localization of features was attempted using Grad-CAM.Results. All of the trained models showed high area under the receiver operating characteristic curve and the most discriminative cutoff levels of probability of normality were determined to be 35.6%, 61.8%, 13.5%, 39.7% for CNN6, VGG16, Inception V3, and Xception models, respectively. Accuracy of the CNN models selecting normal and abnormal vocal folds in the test set was 82.3%, 99.7%, 99.1%, and 83.8%, respectively.Conclusion. All four models showed acceptable diagnostic accuracy. Performance of VGG16 and Inception V3 was better than the simple CNN6 model and the recently published Xception model. Real-time classification with a combination of the VGG16 model, OpenCV, and Grad-CAM on a video stream showed the potential clin-ical applications of the deep learning model in laryngoscopy.

Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images

期刊

JOURNAL OF VOICE

出版社

MOSBY-ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images

期刊

JOURNAL OF VOICE

出版社

MOSBY-ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文