☆ 4.5 Article

Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations

COMPUTER SPEECH AND LANGUAGE (2020)

Journal

COMPUTER SPEECH AND LANGUAGE

Volume 63, Issue -, Pages -

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

DOI: 10.1016/j.csl.2020.101077

Keywords

Analysis of children's speech; Children speech recognition; Automatic speech recognition; Deep learning; Transfer learning; Deep neural network

Funding

U.S. Army Medical Research Acquisition Activity, 820 Chandler Street, Fort Detrick MD
NSF
DoD
Office of the Assistant Secretary of Defense for Health Affairs through the Psychological Health and Traumatic Brain Injury Research Program [W81XWH-15-1-0632]
NIH

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Children speech recognition is challenging mainly due to the inherent high variability in children's physical and articulatory characteristics and expressions. This variability manifests in both acoustic constructs and linguistic usage due to the rapidly changing developmental stage in children's life. Part of the challenge is due to the lack of large amounts of available children speech data for efficient modeling. This work attempts to address the key challenges using transfer learning from adult's models to children's models in a Deep Neural Network (DNN) framework for children's Automatic Speech Recognition (ASR) task evaluating on multiple children's speech corpora with a large vocabulary. The paper presents a systematic and an extensive analysis of the proposed transfer learning technique considering the key factors affecting children's speech recognition from prior literature. Evaluations are presented on (i) comparisons of earlier GMM-HMM and the newer DNN Models, (ii) effectiveness of standard adaptation techniques versus transfer learning, (iii) various adaptation configurations in tackling the variabilities present in children speech, in terms of (a) acoustic spectral variability, and (b) pronunciation variability and linguistic constraints. Our Analysis spans over (i) number of DNN model parameters (for adaptation), (ii) amount of adaptation data, (iii) ages of children, (iv) age dependent-independent adaptation. Finally, we provide Recommendations on (i) the favorable strategies over various aforementioned - analyzed parameters, and (ii) potential future research directions and relevant challenges/problems persisting in DNN based ASR for children's speech. (c) 2020 Elsevier Ltd. All rights reserved.

Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations

Journal

COMPUTER SPEECH AND LANGUAGE

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations

Journal

COMPUTER SPEECH AND LANGUAGE

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper