☆ 4.6 Article

Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

IEEE ACCESS (2021)

Journal

IEEE ACCESS

Volume 9, Issue -, Pages 136476-136486

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2021.3115608

Keywords

Speech recognition; Senior citizens; Linguistics; Hidden Markov models; Feature extraction; Bridges; Adaptation models; Speech recognition; voice translation; spectral feature transform; age-on-demand speech recognition

Funding

Institute of Information & Communications Technology Planning & Evaluation (IITP) - Korean Government [IITP-2021-2020-0-01808]
Ministry of Science and ICT (MIST)
MIST through the IITP Program (Development of Semi-Supervised Learning Language Intelligence Technology and Korean Tutoring Service for Foreigners) [2019-0-00004]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper addresses the poor performance of the elderly in automatic speech recognition by proposing a neural network-based voice conversion framework with unsupervised phonology clustering. The study aims to enhance speech recognition accuracy for minority speakers, such as the elderly, through spectral feature adaptation.

We address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to the ASR. Most of the datasets for speech recognition models consist of datasets collected from adult speakers. Consequently, the majority of commercial speech recognition systems typically tend to perform well on adult speakers. In other words, the limited diversity of speakers in the training datasets yields unreliable performance for minority (e.g., elderly) speakers due to the infeasible acquisition of training data. In response, this paper suggests a neural network-based voice conversion framework to enhance speech recognition of the minority. To this end, we propose a voice translation model including an unsupervised phonology clustering to extract linguistic information to fit the minority's speech to a current acoustic model frame. Our proposal is a spectral feature adaptation method that can be placed in front of any commercial or open ASR system, avoiding directly modifying the speech recognizer. The experimental results and analysis demonstrate the effectiveness of our proposed method through improvement in elderly speech recognition accuracy.

Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper