4.5 Article

Phone duration modeling for speaker age estimation in children

Journal

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA
Volume 152, Issue 5, Pages 3000-3009

Publisher

ACOUSTICAL SOC AMER AMER INST PHYSICS
DOI: 10.1121/10.0015198

Keywords

-

Ask authors/readers for more resources

Automatic inference of paralinguistic information from speech, such as age, is an important area of research with many technological applications. In this paper, a novel technique is proposed for automatic speaker age estimation in children by exploiting temporal variability present in children's speech. Phone durations are used as biomarkers of children's age. Experimental results demonstrate the robustness and portability of the proposed features over multiple domains of varying signal conditions.
Automatic inference of paralinguistic information from speech, such as age, is an important area of research with many technological applications. Speaker age estimation can help with age-appropriate curation of information content and personalized interactive experiences. However, automatic speaker age estimation in children is challenging due to the paucity of speech data representing the developmental spectrum, and the large signal variability including within a given age group. Most prior approaches in child speaker age estimation adopt methods directly drawn from research on adult speech. In this paper, we propose a novel technique that exploits temporal variability present in children's speech for estimation of children's age. We focus on phone durations as biomarker of children's age. Phone duration distributions are derived by forced-aligning children's speech with transcripts. Regression models are trained to predict speaker age among children studying in kindergarten up to grade 10. Experiments on two children's speech datasets are used to demonstrate the robustness and portability of proposed features over multiple domains of varying signal conditions. Phonemes contributing most to estimation of children speaker age are analyzed and presented. Experimental results suggest phone durations contain important development-related information of children. The proposed features are also suited for application under low data scenarios. (C) 2022 Acoustical Society of America.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available