4.5 Article

A code-mixed task-oriented dialog dataset for medical domain

Journal

COMPUTER SPEECH AND LANGUAGE
Volume 78, Issue -, Pages -

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
DOI: 10.1016/j.csl.2022.101449

Keywords

Code-mixed; Dialog dataset; Medical domain; Task oriented

Ask authors/readers for more resources

In the healthcare domain, the interactions between medical professionals and patients are crucial for diagnosis. However, existing AI models in healthcare are designed for monolingual data and cannot handle code-mixed conversations in multilingual regions. To facilitate the research and development of code-mixed medical dialog systems, we introduce a dataset of code-mixed medical dialogs and provide baselines for benchmarking.
In the healthcare domain, medical and patient interactions form a crucial part of the diagnosis. Initially, the AI models developed for healthcare centered only on monolingual data. However, such models do not cater to the multilingual regions, where most conversations are Code-Mixed. We present the Code-Mixed Medical Task-Oriented Dialog Dataset to facilitate the research and development of Code-Mixed medical dialog systems. We analyzed the dataset using medical, conversational, and linguistic theories. The dataset contains 3005 Telugu-English Code-Mixed dialogs between patients and doctors with 29 k utterances covering ten specializations with an average code-mixing index (CMI) of 33.3%. We manually annotated the conversational dataset with intents and slot labels. We also present baselines to establish benchmarks on the dataset using existing state-of-the-art Natural Language Understanding (NLU) models. We improved the existing baselines using contextual ground truth intent labels and processing the slots as chunks. The data is made publically available.1

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available