4.6 Article

Multi-label biomedical question classification for lexical answer type prediction

Journal

JOURNAL OF BIOMEDICAL INFORMATICS
Volume 93, Issue -, Pages -

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2019.103143

Keywords

Biomedical question classification; Lexical answer type prediction; Biomedical LAT corpus; Multi-label classification

Ask authors/readers for more resources

Question classification is considered one of the most significant phases of a typical Question Answering (QA) system. It assigns certain answer types to each question which leads to narrow down the search space of possible answers for factoid and list type questions. The process of assigning certain answer types to each question is also known as Lexical Answer Type (LAT) Prediction. Although much work has been done to enhance the performance of question classification into coarse and fine classes in diverse domains, it is still considered a challenging task in the biomedical field. The difficulty in biomedical question classification stems from the fact that one question might have more than one label or expected answer types associated with it (also, referred to as a multi-label classification). In the biomedical domain, only preliminary work is done to classify multi-label questions by transforming them into a single label through copy transformation technique. In this paper, we have generated a multi-labeled corpus (MLBioMedLAT) by exploring the process of Open Advancement of Question Answering (OAQA) system for the task of biomedical question classification. We use 780 biomedical questions from BioASQ challenge and assign them appropriate labels. To annotate these labels, we use the answers for each question and assign the question semantic type labels by leveraging an existing corpus and utilizing OAQA system. The paper introduces a data transformation approach namely Label Power Set with logistic regression (LPLR) for the task of multi-label biomedical question classification and compares its performance with Structured SVM (SSVM), Restricted Boltzmann Machine (RBM), and copy transformation based logistic regression (CLR) (previously used for a similar task in the OAQA system). To evaluate the integrity of the introduced data transformation technique, we use three prominent evaluation measures namely Micro F-1, Accuracy, and Hamming Loss. Regarding MicroF(1), our introduced technique coupled with a new feature set surpasses CLR, SSVM, and RBM with a margin of 7%, 8%, and 22% respectively.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available