☆ 4.6 Article

Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory

APPLIED SCIENCES-BASEL (2019)

Journal

APPLIED SCIENCES-BASEL

Volume 9, Issue 8, Pages -

Publisher

MDPI

DOI: 10.3390/app9081599

Keywords

virtual reality (VR); self-attention; automatic lip-reading; sensory input; deep learning

Funding

National Natural Science Foundation of China [61571013]
Beijing Natural Science Foundation of China [4143061]
Science and Technology Development Program of Beijing Municipal Education Commission [KM201710009003]
Great Wall Scholar Reserved Talent Program of North China University of Technology [NCUT2017XN018013]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

With the improvement of computer performance, virtual reality (VR) as a new way of visual operation and interaction method gives the automatic lip-reading technology based on visual features broad development prospects. In an immersive VR environment, the user's state can be successfully captured through lip movements, thereby analyzing the user's real-time thinking. Due to complex image processing, hard-to-train classifiers and long-term recognition processes, the traditional lip-reading recognition system is difficult to meet the requirements of practical applications. In this paper, the convolutional neural network (CNN) used to image feature extraction is combined with a recurrent neural network (RNN) based on attention mechanism for automatic lip-reading recognition. Our proposed method for automatic lip-reading recognition can be divided into three steps. Firstly, we extract keyframes from our own established independent database (English pronunciation of numbers from zero to nine by three males and three females). Then, we use the Visual Geometry Group (VGG) network to extract the lip image features. It is found that the image feature extraction results are fault-tolerant and effective. Finally, we compare two lip-reading models: (1) a fusion model with an attention mechanism and (2) a fusion model of two networks. The results show that the accuracy of the proposed model is 88.2% in the test dataset and 84.9% for the contrastive model. Therefore, our proposed method is superior to the traditional lip-reading recognition methods and the general neural networks.

Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory

Journal

APPLIED SCIENCES-BASEL

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory

Journal

APPLIED SCIENCES-BASEL

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper