☆ 4.6 Article

Robust Beam forming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation

IEEE ACCESS (2018)

Journal

IEEE ACCESS

Volume 6, Issue -, Pages 52385-52392

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2018.2870758

Keywords

Acoustic beamforming; multi-channel speech enhancement; deep neural network; robust speech recognition

Funding

National Natural Science Foundation of China [61871265, 61401501]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

This paper addresses the robust beamforming problem for speech recognition using a novel time-frequency mask estimator. The beamformer first estimates the time-frequency mask using a deep neural network (DNN) based on which the covariance matrices of the target speech and noise are computed. Then, the beamformer coefficients are directly obtained via generalized eigenvector decomposition. To achieve accurate covariance matrix estimation for robust beamforming, we propose a DNN-based mask estimator which can exploit the spatial features of the multi-channel microphone signals. The proposed mask estimator leverages the spatial information of the microphone array by using multi-channel signals to estimate a speech-aware mask and a noise-aware mask simultaneously. Using the target-specified masks, accurate covariance matrices of the target speech and noise can be obtained from the observation independently. Experiments on CHiME4 data sets demonstrate that, compared with the baseline toolkit (BeamformIt) and the winner in the CHiME3 challenge, the proposed method achieves better results both in terms of perceptual speech quality and speech recognition error rate.

Robust Beam forming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Robust Beam forming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper