☆ 4.5 Article

Synthesised bigrams using word embeddings for code-switched ASR of four South African language pairs

COMPUTER SPEECH AND LANGUAGE (2019)

期刊

COMPUTER SPEECH AND LANGUAGE

卷 54, 期 -, 页码 151-175

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

DOI: 10.1016/j.csl.2018.10.002

关键词

Code-switching; Word embeddings; IsiZulu; IsiXhosa; Setswana; Sesotho

类别

Computer Science, Artificial Intelligence

资金

Nvidia Corporation
Telkom South Africa
Department of Arts and Culture of South Africa

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Code-switching is the phenomenon whereby multilingual speakers spontaneously alternate between more than one language during discourse and is widespread in multilingual societies. Current state-of-the-art automatic speech recognition (ASR) systems are optimised for monolingual speech, but performance degrades severely when presented with multiple languages. We address ASR of speech containing switches between English and four South African Bantu languages. No comparable study on code-switched speech for these languages has been conducted before and consequently no directly applicable benchmarks exist. A new and unique corpus containing 14.3 hours of spontaneous speech extracted from South African soap operas was used to perform our study. The varied nature of the code-switching in this data presents many challenges to ASR. We focus specifically on how the language model can be improved to better model bilingual language switches for English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. Code-switching examples in the corpus transcriptions were extremely sparse, with the majority of code-switched bigrams occurring only once. Furthermore, differences in language typology between English and the Bantu languages and among the Bantu languages themselves contribute further challenges. We propose a new method using word embeddings trained on text data that is both out-of-domain and monolingual for the synthesis of artificial bilingual code-switched bigrams to augment the sparse language modelling training data. This technique has the particular advantage of not requiring any additional training data that includes code-switching. We show that the proposed approach is able to synthesise valid codeswitched bigrams not seen in the training set. We also show that, by augmenting the training set with these bigrams, we are able to achieve notable reductions for all language pairs in the overall perplexity and particularly substantial reductions in the perplexity calculated across a language switch boundary (between 5 and 31%). We demonstrate that the proposed approach is able to reduce the unseen code-switched bigram types in the test sets by up to 20.5%. Finally, we show that the augmented language models achieve reductions in the word error rate for three of the four language pairs considered. The gains were larger for language pairs with disjunctive orthography than for those with conjunctive orthography. We conclude that the augmentation of language model training data with code-switched bigrams synthesised using word embeddings trained on out-of-domain monolingual text is a viable means of improving the performance of ASR for code-switched speech. (C) 2018 Elsevier Ltd. All rights reserved.

Synthesised bigrams using word embeddings for code-switched ASR of four South African language pairs

期刊

COMPUTER SPEECH AND LANGUAGE

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Synthesised bigrams using word embeddings for code-switched ASR of four South African language pairs

期刊

COMPUTER SPEECH AND LANGUAGE

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文