4.1 Article

Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition

Journal

JOURNAL OF THE AUDIO ENGINEERING SOCIETY
Volume 66, Issue 12, Pages 1072-1081

Publisher

AUDIO ENGINEERING SOC
DOI: 10.17743/jaes.2018.0066

Keywords

-

Funding

  1. Polish National Science Centre [2015/17/B/ST6/01874]

Ask authors/readers for more resources

The aim of the presented study was to evaluate the suitability of 2D audio signal feature maps for speech recognition based on deep learning. The proposed methodology employs a convolutional neural network (CNN) which is a class of deep. feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word recognition rate. Spectral and mel-scale cepstral feature spaces outperform linear cepstra and chroma. The 111-word classification experiment depicts f1 score of 0.99 for spectrum, 0.91 for mel-scale cepstrum, 0.76 for chromagram. and 0.64 for cepstrum feature space on test data set.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available