☆ 4.5 Article

Code-switched automatic speech recognition in five South African languages

COMPUTER SPEECH AND LANGUAGE (2022)

期刊

COMPUTER SPEECH AND LANGUAGE

卷 71, 期 -, 页码 -

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

DOI: 10.1016/j.csl.2021.101262

关键词

Code-switching; Under-resourced languages; African languages; Bantu languages; Speech recognition; TDNN-BLSTM; TDNN-F

类别

Computer Science, Artificial Intelligence

资金

Department of Arts and Culture (DAC) of the South African government
Telkom South Africa

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study describes the efforts to improve an ASR system that can process code-switched South African speech. They used a balanced corpus and explored various methods to address data sparsity. The inclusion of both in-domain and out-of-domain data sources led to improvements, and TDNN-F architectures outperformed TDNN-BLSTM models in their data-sparse scenario.

Most automatic speech recognition (ASR) systems are optimised for one specific language and their performance consequently deteriorates drastically when confronted with multilingual or code-switched speech. We describe our efforts to improve an ASR system that can process code-switched South African speech that contains English and four indigenous languages: isiZulu, isiXhosa, Sesotho and Setswana. We begin using a newly developed language-balanced corpus of code-switched speech compiled from South African soap operas, which are rich in spontaneous code-switching. The small size of the corpus makes this scenario under-resourced, and hence we explore several ways of addressing this sparsity of data. We consider augmenting the acoustic training sets with in-domain data at the expense of making it unbalanced and dominated by English. We further explore the inclusion of monolingual out-of-domain data in the constituent languages. For language modelling, we investigate the inclusion of outof-domain text data sources and also the inclusion of synthetically-generated code-switch bigrams. In our experiments, we consider two system architectures. The first considers four bilingual speech recognisers, each allowing code-switching between English and one of the indigenous languages. The second considers a single pentalingual speech recogniser able to process switching between all five languages. We find that the additional inclusion of each acoustic and text data source leads to some improvements. While in-domain data is substantially more effective, performance gains were also achieved using out-of-domain data, which is often much easier to obtain. We also find that improvements are achieved in all five languages, even when the training set becomes unbalanced and heavily skewed in favour of English. Finally, we find the use of TDNN-F architectures for the acoustic model to consistently outperform TDNN-BLSTM models in our data-sparse scenario.

Code-switched automatic speech recognition in five South African languages

期刊

COMPUTER SPEECH AND LANGUAGE

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Code-switched automatic speech recognition in five South African languages

期刊

COMPUTER SPEECH AND LANGUAGE

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文