☆ 4.7 Article

An empirical study of the design choices for local citation recommendation systems

EXPERT SYSTEMS WITH APPLICATIONS (2022)

期刊

EXPERT SYSTEMS WITH APPLICATIONS

卷 200, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2022.116852

关键词

Natural language processing; Citation recommendation; Information retrieval; BM25; SPECTER; Negative sampling

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Operations Research & Management Science

资金

Croatian Science Foundation [HRZZ-DOK-2018-09]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study demonstrates the impact of three important design choices in building local citation recommendation systems: parameters of prefiltering models, training regime, and negative sampling strategy. Optimizing these choices can significantly improve the system's performance.

As the number of published research articles grows on a daily basis, it is becoming increasingly difficult for scientists to keep up with the published work. Local citation recommendation (LCR) systems, which produce a list of relevant articles to be cited in a given text passage, could help alleviate the burden on scientists and facilitate research. While research on LCR is gaining popularity, building such systems involves a number of important design choices that are often overlooked. We present an empirical study of the impact of the three design choices in two-stage LCR systems consisting of a prefiltering and a reranking phase. In particular, we investigate (1) the impact of the prefiltering models' parameters on the model's performance, as well as the impact of (2) the training regime and (3) negative sampling strategy on the performance of the reranking model. We evaluate various combinations of these parameters on two datasets commonly used for LCR and demonstrate that specific combinations improve the model's performance over the widely used standard approaches. Specifically, we demonstrate that (1) optimizing prefiltering models' parameters improves R@1000 in the range of 3% to 12% in absolute value, (2) using the strict training regime improves both R@10 and MRR (up to a maximum of 3.4% and 2.6%, respectively) in all combinations of dataset and prefiltering model, and (3) a careful choice of negative examples can further improve both R@10 and MRR (up to a maximum of 11.9% and 8%, respectively) in both datasets used Our results show that the design choices we considered are important and should be given greater consideration when building LCR systems.

An empirical study of the design choices for local citation recommendation systems

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

An empirical study of the design choices for local citation recommendation systems

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文