4.5 Article

Appearance and shape-based hybrid visual feature extraction: toward audio-visual automatic speech recognition

期刊

SIGNAL IMAGE AND VIDEO PROCESSING
卷 15, 期 1, 页码 25-32

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s11760-020-01717-0

关键词

AV-ASR; Appearance and shape-based hybrid visual speech features; LBP-TOP; DCT; PZM; Hybrid classifier (classifier combination)

向作者/读者索取更多资源

This research introduces a new set of hybrid visual features that combine shape-based and appearance-based features to enhance the performance of visual speech recognition systems. By calculating features such as Pseudo-Zernike Moment, Local Binary Pattern-three orthogonal planes, and Discrete Cosine Transform, the goal is to embed global and local visual information into a compact feature set.
Nowadays, audio-visual automatic speech recognition (AV-ASR) is an emerging field of research, but there is still lack of proper visual features for visual speech recognition. Visual features are mainly categorized into shape based and appearance based. Based on the different information embedded in shape and appearance features, this paper proposes a new set of hybrid visual features which lead to a better visual speech recognition system. Pseudo-Zernike Moment (PZM) is calculated for shape-based visual feature while Local Bnary Pattern-three orthogonal planes (LBP-TOP) and Discrete Cosine Transform (DCT) are calculated for the appearance-based feature. Moreover, our proposed method also gathers global and local visual information. Thus, the objective of the proposed system is to embed all this visual information into a compact features set. Here, for audio speech recognition, the proposed system uses Mel-frequency cepstral coefficients (MFCC). We also propose a hybrid classification method to carry out all the experiments of AV-ASR. Artificial Neural Network (ANN), multiclass Support Vector Machine (SVM) and Naive Bayes (NB) classifiers are used for classifier hybridization. It is shown that the proposed AV-ASR system with a hybrid classifier significantly improves the recognition rate.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据