4.6 Article

Wavelet feature selection of audio and imagined/vocalized EEG signals for ANN based multimodal ASR system

Journal

BIOMEDICAL SIGNAL PROCESSING AND CONTROL
Volume 63, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.bspc.2020.102218

Keywords

EEG; Vocalized speech recognition; Imagined speech recognition; Wavelet transform; ANN

Funding

  1. University Grant Commission, India

Ask authors/readers for more resources

The research introduces an Automatic Speech Recognition system based on audio and EEG signals, demonstrating that combining different modalities can enhance speech recognition accuracy. The results indicate the possibility of speech recognition from EEG signals and the potential for improving recognition rate by fusing audio with EEG.
Human-Machine Interaction (HMI) systems demand the use of multiple modalities for correct interaction. Research on these systems started with audio signals for speech recognition and now progressing towards cooperation of other biosignals. Thus, the paper presents an Automatic Speech Recognition (ASR) system based on a single and multiple modalities that include audio and Electroencephalogram (EEG) signals to explore speech recognition. It extracts speech information concealed in audio and ten channels of imagined EEG (EEG-i) & vocalized EEG (EEG-v) signals. Three Wavelet Transform (WT) methods - Discrete Wavelet Transform (DWT), Wavelet Packet Decomposition (WPD) & hybrid of DWT & WPD (DWPD) with four-level decomposition is used to transform the signals into WT coefficients. Then, six statistical parameters are computed from WT coefficients to generate 63 (26-1) feature vectors for each method. An exhaustive search from 63 feature vectors is conducted to determine the best parameter combination that attains good accuracy with ANN classifier. Then, accuracy is improved with 5 level decomposition on WPD coefficients along with the best parameter combination. Results include the accuracy of unimodal ASR & multimodal ASR. WPD method achieved best accuracy as 74.48%, 56.29%, 42.02%, 77.97% & 78.90% for multiclass classification of prompts+words based on audio, EEG-i, EEG-v, audio + EEG-i & audio + EEG-v respectively. It indicates that speech recognition is possible from EEG signals & the fusion of audio with EEG enhances the recognition rate of audio & EEG. The results also show that the proposed method outperforms other methods in the area.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available