☆ 4.2 Article

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING (2015)

Journal

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING

Volume -, Issue -, Pages -

Publisher

SPRINGEROPEN

DOI: 10.1186/s13636-015-0056-7

Keywords

Speaker recognition; Bottleneck features; Denoising autoencoder; Deep neural network; Reverberant speech

Funding

Kayamori Foundation of Informational Science Advancement
Grants-in-Aid for Scientific Research [15K16020] Funding Source: KAKEN

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For the DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new feature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely, cepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of reverberant speech to that of clean speech with the expectation of improving the performance of distant-talking speaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a strong complementary nature, the combination of these two methods is expected to be very effective for distant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech set, with reverberant environments differing from the training environments. In suppressing late reverberation, our method outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean squares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the bottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the DNN-based bottleneck feature and DAE-based dereverberation further improved the performance.

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Journal

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING

Publisher

SPRINGEROPEN

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Journal

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING

Publisher

SPRINGEROPEN

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper