4.6 Article

Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

Journal

COGNITIVE COMPUTATION
Volume 13, Issue 5, Pages 1136-1153

Publisher

SPRINGER
DOI: 10.1007/s12559-021-09914-w

Keywords

Mel-frequency cepstral coefficients; Standard deviation; Principal component analysis

Funding

  1. Universiti Kebangsaan Malaysia [GUP-2020-063]

Ask authors/readers for more resources

Spoken language identification involves determining and classifying natural language from given content and datasets. The proposed method in this study focuses on reducing MFCC feature dimensions to optimize identification efficiency.
Spoken language identification (LID) is the process of determining and classifying natural language from a given content and dataset. Data must be processed to extract useful features to perform LID. The mel-frequency cepstral coefficient (MFCC) is one of the most popular feature extraction techniques in LID. The MFCC features are generated to serve as inputs for the classification stage. In this study, reduction in the MFCC feature dimension is investigated because large data size affects the computational time and resources (i.e., memory space) and slows the identification speed. The implementation of data reduction techniques to retain the most important feature parameters is also evaluated in this study. The investigation of data reduction is based on standard deviation (STD) calculation and principal component analysis (PCA). The features based on MFCC and the reduced dimensions based on STD and PCA results are then used as inputs to an optimized extreme learning machine (ELM) classifier called the optimized genetic algorithm-ELM (OGA-ELM). Several sets of data samples with one dimension of principal components (i.e., 119) are utilized for the evaluation. The results are generated using two different datasets. The first dataset is derived from eight separate languages, whereas the second dataset is a part of the National Institute of Standards and Technology Language Recognition Evaluation 2009 dataset. To evaluate the performance of the proposed method, this study utilizes several assessment measures, namely, accuracy, recall, precision, F-measure, G-mean, and identification time. The best LID performance is observed when the MFCC based on STD and PCA features with 119 feature dimensions is used with OGA-ELM as the classifier. The experimental results show that the proposed MFCC method achieves 99.38% accuracy using the first dataset. Additionally, it achieves accuracies of up to 97.60%, 96.80%, and 91.20% using the second dataset with durations of 30, 10, and 3 s, respectively. The proposed MFCC method exhibits the fastest computational time in all experiments, requiring only a few seconds to identify languages. Using a data reduction technique can substantially speed up the computational time, overcome resource limitations, and improve LID performance.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available