4.6 Article

Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

Journal

IEEE ACCESS
Volume 10, Issue -, Pages 30069-30079

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3159339

Keywords

Speech recognition; Task analysis; Recurrent neural networks; Computational modeling; Deep learning; Training; Standards; Automatic speech recognition; deep supervised learning; recurrent neural network; spoken English digit dataset

Ask authors/readers for more resources

In this study, an enhanced deep learning LSTM recurrent neural network (RNN) model was proposed to address the limitation of traditional LSTM in processing continuous input streams. The proposed model incorporates RNN as a forget gate in the memory block to reset cell states, enabling more efficient processing of continuous input streams.
Automatic speech recognition (ASR) is one of the most demanding tasks in natural language processing owing to its complexity. Recently, deep learning approaches have been deployed for this task and have been proven to outperform traditional machine learning approaches such as Artificial Neural Network (ANN). In particular, deep-learning methods such as long short-term memory (LSTM) have achieved improved ASR performance. However, this method is limited to processing continuous input streams. Traditional LSTM requires four (4) linear layers (multilayer perceptron (MLP) layer) per cell with a large memory bandwidth for each sequence time step. LSTM cannot accommodate the many computational units required for processing continuous input streams because the system does not have sufficient memory bandwidth to feed the computational units. In this study, an enhanced deep learning LSTM recurrent neural network (RNN) model was proposed to resolve this shortcoming. In the proposed model, the RNN is incorporated as a forget gate to the memory block to allow the resetting of cell states at the beginning of the sub-sequences. This enables the system to process continuous input streams efficiently without necessarily increasing the required bandwidths. In the proposed model, the standard architecture of the LSTM network is modified to effectively use the model parameters. Some CNN-based and sequential models were used on the same dataset, and the models were compared with the proposed model. LSTM-RNN outperformed the other deep learning models with an accuracy of 99.36% on the well-established public benchmark spoken English digit dataset.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available