☆ 4.5 Article

SummEval: Re-evaluating Summarization Evaluation

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (2021)

Journal

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS

Volume 9, Issue -, Pages 391-409

Publisher

MIT PRESS

DOI: 10.1162/tacl_a_00373

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study addresses the shortcomings of text summarization evaluation methods by re-evaluating automatic evaluation metrics, benchmarking recent summarization models, sharing a large collection of model-generated summaries, providing an evaluation toolkit, and assembling a diverse collection of human judgments on model-generated summaries.

The scarcity of comprehensive up-to-date studies on evaluationmetrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations; 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics; 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format; 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics; and 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd-sourceworkers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in developing evaluation metrics that better correlate with human judgments.

SummEval: Re-evaluating Summarization Evaluation

Journal

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS

Publisher

MIT PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

SummEval: Re-evaluating Summarization Evaluation

Journal

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS

Publisher

MIT PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper