4.6 Article

Robust Beam forming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation

Journal

IEEE ACCESS
Volume 6, Issue -, Pages 52385-52392

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2018.2870758

Keywords

Acoustic beamforming; multi-channel speech enhancement; deep neural network; robust speech recognition

Funding

  1. National Natural Science Foundation of China [61871265, 61401501]

Ask authors/readers for more resources

This paper addresses the robust beamforming problem for speech recognition using a novel time-frequency mask estimator. The beamformer first estimates the time-frequency mask using a deep neural network (DNN) based on which the covariance matrices of the target speech and noise are computed. Then, the beamformer coefficients are directly obtained via generalized eigenvector decomposition. To achieve accurate covariance matrix estimation for robust beamforming, we propose a DNN-based mask estimator which can exploit the spatial features of the multi-channel microphone signals. The proposed mask estimator leverages the spatial information of the microphone array by using multi-channel signals to estimate a speech-aware mask and a noise-aware mask simultaneously. Using the target-specified masks, accurate covariance matrices of the target speech and noise can be obtained from the observation independently. Experiments on CHiME4 data sets demonstrate that, compared with the baseline toolkit (BeamformIt) and the winner in the CHiME3 challenge, the proposed method achieves better results both in terms of perceptual speech quality and speech recognition error rate.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available