4.7 Article

Evaluating the generalisability of neural rumour verification models

Journal

INFORMATION PROCESSING & MANAGEMENT
Volume 60, Issue 1, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2022.103116

Keywords

Rumour verification; Generalisability; Rumour dataset; Deep learning

Ask authors/readers for more resources

Research on automated social media rumour verification has achieved high performance with neural models, but their generalisability to datasets beyond the ones they were trained on remains unclear. This study aims to fill this gap by assessing the generalisability of top performing neural rumour verification models across different architectures. A novel dataset called COVID-RV is collected and released for a more comprehensive evaluation of model performance. The study finds a significant drop in performance when testing models on different datasets from their training sets. Additionally, the ability of models to generalise in a few-shot learning setup and with updated word embeddings is evaluated.
Research on automated social media rumour verification, the task of identifying the veracity of questionable information circulating on social media, has yielded neural models achieving high performance, with accuracy scores that often exceed 90%. However, none of these studies focus on the real-world generalisability of the proposed approaches, that is whether the models perform well on datasets other than those on which they were initially trained and tested. In this work we aim to fill this gap by assessing the generalisability of top performing neural rumour verification models covering a range of different architectures from the perspectives of both topic and temporal robustness. For a more complete evaluation of generalisability, we collect and release COVID-RV, a novel dataset of Twitter conversations revolving around COVID-19 rumours. Unlike other existing COVID-19 datasets, our COVID-RV contains conversations around rumours that follow the format of prominent rumour verification benchmarks, while being different from them in terms of topic and time scale, thus allowing better assessment of the temporal robustness of the models. We evaluate model performance on COVID-RV and three popular rumour verification datasets to understand limitations and advantages of different model architectures, training datasets and evaluation scenarios. We find a dramatic drop in performance when testing models on a different dataset from that used for training. Further, we evaluate the ability of models to generalise in a few-shot learning setup, as well as when word embeddings are updated with the vocabulary of a new, unseen rumour. Drawing upon our experiments we discuss challenges and make recommendations for future research directions in addressing this important problem.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available