3.8 Proceedings Paper

On the Current State of Reproducibility and Reporting of Uncertainty for Aspect-Based Sentiment Analysis

出版社

SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-031-26390-3_31

关键词

Natural Language Processing; Sentiment analysis; Pre-trained language models; Reproducibility

向作者/读者索取更多资源

Aspect-Based Sentiment Analysis has been a prominent field in Natural Language Processing over the past decade. Utilizing the transformer architecture behind BERT, significant improvements have been made, yet reproducibility and uncertainty measures are crucial for accurate reporting of results. This research provides a comprehensive comparison of six architectures, highlighting the difficulty in replicating reported performances and the importance of accounting for uncertainty.
For the latter part of the past decade, Aspect-Based Sentiment Analysis has been a field of great interest within Natural Language Processing. Supported by the Semantic Evaluation Conferences in 20142016, a variety of methods has been developed competing in improving performances on benchmark data sets. Exploiting the transformer architecture behind BERT, results improved rapidly and efforts in this direction still continue today. Our contribution to this body of research is a holistic comparison of six different architectures which achieved (near) state-of-the-art results at some point in time. We utilize a broad spectrum of five publicly available benchmark data sets and introduce a fixed setting with respect to the pre-processing, the train/validation splits, the performance measures and the quantification of uncertainty. Overall, our findings are two-fold: First, we find that the results reported in the scientific articles are hardly reproducible, since in our experiments the observed performance most of the time fell short of the reported one. Second, the results are burdened with notable uncertainty, depending on the data splits, which is why a reporting of uncertainty measures is crucial.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据