4.3 Article

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach

Journal

JOURNAL OF VOICE
Volume 33, Issue 5, Pages 634-641

Publisher

MOSBY-ELSEVIER
DOI: 10.1016/j.jvoice.2018.02.003

Keywords

Nodule; Polyp; Neoplasm; Spasmodic dysphonia; Sulcus

Funding

  1. Ministry of Science and Technology [MOST 105-2221-E-155-013-MY3, 106-2314-B-418-003, 107-2634-F-155-001]

Ask authors/readers for more resources

Objectives. Computerized detection of voice disorders has attracted considerable academic and clinical interest in the hope of providing an effective screening method for voice diseases before endoscopic confirmation. This study proposes a deep-learning-based approach to detect pathological voice and examines its performance and utility compared with other automatic classification algorithms. Methods. This study retrospectively collected 60 normal voice samples and 402 pathological voice samples of 8 common clinical voice disorders in a voice clinic of a tertiary teaching hospital. We extracted Mel frequency cepstral coefficients from 3-second samples of a sustained vowel. The performances of three machine learning algorithms, namely, deep neural network (DNN), support vector machine, andGaussian mixture model, were evaluated based on a fivefold cross-validation. Collective cases from the voice disorder database of MEEI (Massachusetts Eye and Ear Infirmary) were used to verify the performance of the classification mechanisms. Results. The experimental results demonstrated that DNN outperforms Gaussian mixture model and support vector machine. Its accuracy in detecting voice pathologies reached 94.26% and 90.52% in male and female subjects, based on three representative Mel frequency cepstral coefficient features. When applied to the MEEI database for validation, theDNN also achieved a higher accuracy (99.32%) than the other two classification algorithms. Conclusions. By stacking several layers of neurons with optimized weights, the proposed DNN algorithm can fully utilize the acoustic features and efficiently differentiate between normal and pathological voice samples. Based on this pilot study, future researchmay proceed to explore more application ofDNNfrom laboratory and clinical perspectives.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available