Journal
INFORMATION PROCESSING & MANAGEMENT
Volume 60, Issue 1, Pages -Publisher
ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2022.103116
Keywords
Rumour verification; Generalisability; Rumour dataset; Deep learning
Ask authors/readers for more resources
Research on automated social media rumour verification has achieved high performance with neural models, but their generalisability to datasets beyond the ones they were trained on remains unclear. This study aims to fill this gap by assessing the generalisability of top performing neural rumour verification models across different architectures. A novel dataset called COVID-RV is collected and released for a more comprehensive evaluation of model performance. The study finds a significant drop in performance when testing models on different datasets from their training sets. Additionally, the ability of models to generalise in a few-shot learning setup and with updated word embeddings is evaluated.
Research on automated social media rumour verification, the task of identifying the veracity of questionable information circulating on social media, has yielded neural models achieving high performance, with accuracy scores that often exceed 90%. However, none of these studies focus on the real-world generalisability of the proposed approaches, that is whether the models perform well on datasets other than those on which they were initially trained and tested. In this work we aim to fill this gap by assessing the generalisability of top performing neural rumour verification models covering a range of different architectures from the perspectives of both topic and temporal robustness. For a more complete evaluation of generalisability, we collect and release COVID-RV, a novel dataset of Twitter conversations revolving around COVID-19 rumours. Unlike other existing COVID-19 datasets, our COVID-RV contains conversations around rumours that follow the format of prominent rumour verification benchmarks, while being different from them in terms of topic and time scale, thus allowing better assessment of the temporal robustness of the models. We evaluate model performance on COVID-RV and three popular rumour verification datasets to understand limitations and advantages of different model architectures, training datasets and evaluation scenarios. We find a dramatic drop in performance when testing models on a different dataset from that used for training. Further, we evaluate the ability of models to generalise in a few-shot learning setup, as well as when word embeddings are updated with the vocabulary of a new, unseen rumour. Drawing upon our experiments we discuss challenges and make recommendations for future research directions in addressing this important problem.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available