☆ 4.5 Article

IITG-HingCoS corpus: A Hinglish code-switching database for automatic speech recognition

SPEECH COMMUNICATION (2019)

期刊

SPEECH COMMUNICATION

卷 110, 期 -, 页码 76-89

出版社

ELSEVIER

DOI: 10.1016/j.specom.2019.04.007

关键词

Code-switching; Speech and text corpora; Automatic speech recognition; Language modeling

类别

Acoustics Computer Science, Interdisciplinary Applications

资金

Ministry of Electronics and Information Technology [11(18)/2012-HCC]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Code-switching is a phenomenon in linguistics which refers to the use of two or more languages, especially within the same discourse. This phenomenon has been observed in many multilingual communities across the globe. In the recent past, there have been increasing demand for automatic speech recognition (ASR) systems to deal with code-switching. However, for training such systems, very limited code-switching resources are available as yet. Thus, the development of code-switching resources is highly desirable. In this work, we describe the collection of a Hinglish (Hindi-English) code-switching database at the Indian Institute of Technology Guwahati (IITG) which is referred to as the IITG-HingCoS corpus. This corpus consists of code-switching text data having 25,988 sentences with a total of 0.58 million words. In addition to that, the corpus also contains 25 h of matching speech data corresponding to 9251 code-switching sentences covering a vocabulary of 6542 words. This paper elaborates the sources and the protocol used for collecting the corpus. The baseline experimental results on the collected corpus for language modeling and ASR tasks are also presented.

IITG-HingCoS corpus: A Hinglish code-switching database for automatic speech recognition

期刊

SPEECH COMMUNICATION

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

IITG-HingCoS corpus: A Hinglish code-switching database for automatic speech recognition

期刊

SPEECH COMMUNICATION

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文