Journal
COMPUTER SPEECH AND LANGUAGE
Volume 78, Issue -, Pages -Publisher
ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
DOI: 10.1016/j.csl.2022.101449
Keywords
Code-mixed; Dialog dataset; Medical domain; Task oriented
Categories
Ask authors/readers for more resources
In the healthcare domain, the interactions between medical professionals and patients are crucial for diagnosis. However, existing AI models in healthcare are designed for monolingual data and cannot handle code-mixed conversations in multilingual regions. To facilitate the research and development of code-mixed medical dialog systems, we introduce a dataset of code-mixed medical dialogs and provide baselines for benchmarking.
In the healthcare domain, medical and patient interactions form a crucial part of the diagnosis. Initially, the AI models developed for healthcare centered only on monolingual data. However, such models do not cater to the multilingual regions, where most conversations are Code-Mixed. We present the Code-Mixed Medical Task-Oriented Dialog Dataset to facilitate the research and development of Code-Mixed medical dialog systems. We analyzed the dataset using medical, conversational, and linguistic theories. The dataset contains 3005 Telugu-English Code-Mixed dialogs between patients and doctors with 29 k utterances covering ten specializations with an average code-mixing index (CMI) of 33.3%. We manually annotated the conversational dataset with intents and slot labels. We also present baselines to establish benchmarks on the dataset using existing state-of-the-art Natural Language Understanding (NLU) models. We improved the existing baselines using contextual ground truth intent labels and processing the slots as chunks. The data is made publically available.1
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available