3.8 Proceedings Paper

Experimental Evaluation of CNN Architecture for Speech Recognition

出版社

SPRINGER-VERLAG SINGAPORE PTE LTD
DOI: 10.1007/978-981-15-0029-9_40

关键词

Convolution Neural Networks (CNN); Deep Neural Networks (DNN); Kernel; Mel-Frequency Cepstral Coefficients (MFCC); Speech Recognition

向作者/读者索取更多资源

In recent days, deep learning has been widely used in signal and information processing. Among the deep learning algorithms, Convolution Neural Network (CNN) has been widely used for image recognition and classification because of its architecture, high accuracy and efficiency. This paper proposes a method that uses the CNN on audio samples rather than on the image samples in which the CNN method is usually used to train the model. The one-dimensional audio samples are converted into two-dimensional data that consists of matrix of Mel-Frequency Cepstral Coefficients (MFCCs) that are extracted from the audio samples and the number of windows used in the extraction. This proposed CNN model has been evaluated on the TIDIGITS corpus dataset. The paper analyzes different convolution layer architectures with different number of feature maps in each architecture. The three-layer convolution architecture was found to have the highest accuracy of 97.46% among the other discussed architectures.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据