☆ 3.8 Proceedings Paper

Homophone Disambiguation Profits from Durational Information

INTERSPEECH 2022 (2022)

期刊

INTERSPEECH 2022

卷 -, 期 -, 页码 3198-3202

出版社

ISCA-INT SPEECH COMMUNICATION ASSOC

DOI: 10.21437/Interspeech.2022-10109

关键词

homophone disambiguation; prosodic features; Random Forest; CNN; conversational speech; Austrian German

类别

Acoustics Audiology & Speech-Language Pathology Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

资金

Austrian Science Fund (FWF) [V-638-N33]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The high degree of segmental reduction in conversational speech leads to a large number of words becoming homophones, which poses a challenge for automatic speech recognition. This study proposes two approaches, one based on prosodic and spectral features and the other based on a convolutional neural network, to disambiguate homophones. The results show potential for both approaches, especially when combined with a stochastic language model as part of an ASR system.

Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered in this study ah, ach, auch, eine and er may all be reduced to [a] in conversational Austrian German. Homophones pose a serious problem for automatic speech recognition (ASR), where homophone disambiguation is typically solved using lexical context. In contrast, we propose two approaches to disambiguate homophones on the basis of prosodic and spectral features. First, we build a Random Forest classifier with a large set of acoustic features, which reaches good performance given the small data size, and allows us to gain insight into how these homophones are distinct with respect to phonetic detail. Since for the extraction of the features annotations are required, this approach would not be practical for the integration into an ASR system. We thus explored a second, convolutional neural network (CNN) based approach. The performance of this approach is on par with the one based on Random Forest, and the results indicate a high potential of this approach to facilitate homophone disambiguation when combined with a stochastic language model as part of an ASR system.

Homophone Disambiguation Profits from Durational Information

期刊

INTERSPEECH 2022

出版社

ISCA-INT SPEECH COMMUNICATION ASSOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Homophone Disambiguation Profits from Durational Information

期刊

INTERSPEECH 2022

出版社

ISCA-INT SPEECH COMMUNICATION ASSOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文