☆ 3.8 Review

Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition

INFORMATION (2023)

期刊

INFORMATION

卷 14, 期 2, 页码 -

出版社

MDPI

DOI: 10.3390/info14020137

关键词

automatic speech recognition; causality; speaking styles; data generation processes; annotation

类别

Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Superficially, read and spontaneous speech are two main types of training data in automatic speech recognition, but they are fundamentally different due to the way the audio signal is generated. This review introduces causal reasoning into automatic speech recognition, highlighting the impact of data generation processes on inference and performance. By applying a causal perspective, this work discusses the relationship between data generation mechanisms, learning, and prediction in speech data. Furthermore, the authors argue that a causal perspective can enhance the understanding of models in speech processing.

Superficially, read and spontaneous speech-the two main kinds of training data for automatic speech recognition-appear as complementary, but are equal: pairs of texts and acoustic signals. Yet, spontaneous speech is typically harder for recognition. This is usually explained by different kinds of variation and noise, but there is a more fundamental deviation at play: for read speech, the audio signal is produced by recitation of the given text, whereas in spontaneous speech, the text is transcribed from a given signal. In this review, we embrace this difference by presenting a first introduction of causal reasoning into automatic speech recognition, and describing causality as a tool to study speaking styles and training data. After breaking down the data generation processes of read and spontaneous speech and analysing the domain from a causal perspective, we highlight how data generation by annotation must affect the interpretation of inference and performance. Our work discusses how various results from the causality literature regarding the impact of the direction of data generation mechanisms on learning and prediction apply to speech data. Finally, we argue how a causal perspective can support the understanding of models in speech processing regarding their behaviour, capabilities, and limitations.

Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition

期刊

INFORMATION

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition

期刊

INFORMATION

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文