Journal
JOURNAL OF THE AUDIO ENGINEERING SOCIETY
Volume 66, Issue 12, Pages 1072-1081Publisher
AUDIO ENGINEERING SOC
DOI: 10.17743/jaes.2018.0066
Keywords
-
Categories
Funding
- Polish National Science Centre [2015/17/B/ST6/01874]
Ask authors/readers for more resources
The aim of the presented study was to evaluate the suitability of 2D audio signal feature maps for speech recognition based on deep learning. The proposed methodology employs a convolutional neural network (CNN) which is a class of deep. feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word recognition rate. Spectral and mel-scale cepstral feature spaces outperform linear cepstra and chroma. The 111-word classification experiment depicts f1 score of 0.99 for spectrum, 0.91 for mel-scale cepstrum, 0.76 for chromagram. and 0.64 for cepstrum feature space on test data set.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available