4.5 Article

ExpMRC: explainability evaluation for machine reading comprehension

Journal

HELIYON
Volume 8, Issue 4, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.heliyon.2022.e09290

Keywords

Machine reading comprehension; Explainable artificial intelligence; Natural language processing

Funding

  1. National Key Research and Develop-ment Program of China [2018YFB1005100]

Ask authors/readers for more resources

This paper proposes a new benchmark called ExpMRC for evaluating the textual explainability of Machine Reading Comprehension (MRC) systems. The benchmark consists of four subsets, with additional annotations of answer evidence. State-of-the-art pre-trained language models are used to build baseline systems, and unsupervised approaches are adopted to extract answer and evidence spans. Experimental results show that the current systems still have a gap from human performance.
Achieving human-level performance on some Machine Reading Comprehension (MRC) datasets is no longer challenging with the help of powerful Pre-trained Language Models (PLMs). However, it is necessary to provide both answer prediction and its explanation to further improve the MRC system's reliability, especially for real-life applications. In this paper, we propose a new benchmark called ExpMRC for evaluating the textual explainability of the MRC systems. ExpMRC contains four subsets, including SQuAD, CMRC 2018, RACE(+), and C3, with additional annotations of the answer's evidence. The MRC systems are required to give not only the correct answer but also its explanation. We use state-of-the-art PLMs to build baseline systems and adopt various unsupervised approaches to extract both answer and evidence spans without human-annotated evidence spans. The experimental results show that these models are still far from human performance, suggesting that the ExpMRC is challenging. Resources (data and baselines) are available through https://github .com /ymcui /expmrc.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available