☆ 4.5 Article

A code-mixed task-oriented dialog dataset for medical domain

COMPUTER SPEECH AND LANGUAGE (2023)

Journal

COMPUTER SPEECH AND LANGUAGE

Volume 78, Issue -, Pages -

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

DOI: 10.1016/j.csl.2022.101449

Keywords

Code-mixed; Dialog dataset; Medical domain; Task oriented

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In the healthcare domain, the interactions between medical professionals and patients are crucial for diagnosis. However, existing AI models in healthcare are designed for monolingual data and cannot handle code-mixed conversations in multilingual regions. To facilitate the research and development of code-mixed medical dialog systems, we introduce a dataset of code-mixed medical dialogs and provide baselines for benchmarking.

In the healthcare domain, medical and patient interactions form a crucial part of the diagnosis. Initially, the AI models developed for healthcare centered only on monolingual data. However, such models do not cater to the multilingual regions, where most conversations are Code-Mixed. We present the Code-Mixed Medical Task-Oriented Dialog Dataset to facilitate the research and development of Code-Mixed medical dialog systems. We analyzed the dataset using medical, conversational, and linguistic theories. The dataset contains 3005 Telugu-English Code-Mixed dialogs between patients and doctors with 29 k utterances covering ten specializations with an average code-mixing index (CMI) of 33.3%. We manually annotated the conversational dataset with intents and slot labels. We also present baselines to establish benchmarks on the dataset using existing state-of-the-art Natural Language Understanding (NLU) models. We improved the existing baselines using contextual ground truth intent labels and processing the slots as chunks. The data is made publically available.1

A code-mixed task-oriented dialog dataset for medical domain

Journal

COMPUTER SPEECH AND LANGUAGE

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A code-mixed task-oriented dialog dataset for medical domain

Journal

COMPUTER SPEECH AND LANGUAGE

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper