☆ 4.6 Article

Modeling Speech Sound Radiation With Different Degrees of Realism for Articulatory Synthesis

IEEE ACCESS (2022)

期刊

IEEE ACCESS

卷 10, 期 -, 页码 95008-95019

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2022.3204816

关键词

Mouth; Solid modeling; Mathematical models; Synthesizers; Speech recognition; Microphones; Radiation characteristic; articulatory synthesis

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Articulatory synthesis is a technique that models the physical phenomena of speech production. This study simulated different radiation characteristics in synthesized sentences and evaluated their naturalness. The results indicate that the similarity of the synthesized sentences' spectra to real speech affects their perceived naturalness.

Articulatory synthesis is based on modeling various physical phenomena of speech production, including sound radiation from the mouth. With regard to sound radiation, the most common approach is to approximate it in terms of a simple spherical source of strength equal to the mouth volume velocity. However, because this approximation is only valid at very low frequencies and does not account for the diffraction by the head and the torso, we simulated two alternative radiation characteristics that are potentially more realistic: the radiation from a vibrating piston in a spherical baffle, and the radiation from the mouth of a detailed model of the human head and torso. Using the articulatory speech synthesizer VocalTractLab, a corpus of 10 sentences was synthesized with the different radiation characteristics combined with three different phonation types. The synthesized sentences were acoustically compared with natural recordings of the same sentences in terms of their long-term average spectra (LTAS), and evaluated in terms of their naturalness and intelligibility. The intelligibility was not affected by the type of radiation characteristic. However, it was found that the more similar their LTAS was to real speech, the more natural the synthetic sentences were perceived to be. Hence, the naturalness was not directly determined by the realism of the radiation characteristic, but by the combined spectral effect of the radiation characteristic and the voice source. While the more realistic radiation models do not per se improve synthesis quality, they provide new insights in the study of speech production and articulatory synthesis.

Modeling Speech Sound Radiation With Different Degrees of Realism for Articulatory Synthesis

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Modeling Speech Sound Radiation With Different Degrees of Realism for Articulatory Synthesis

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文