☆ 4.5 Article

unarXive: a large scholarly data set with publications' full-text, annotated in-text citations, and links to metadata

SCIENTOMETRICS (2020)

期刊

SCIENTOMETRICS

卷 125, 期 3, 页码 3085-3108

出版社

SPRINGER

DOI: 10.1007/s11192-020-03382-z

关键词

Scholarly data; Citations; arXiv; org; Digital libraries; Data set

类别

Computer Science, Interdisciplinary Applications Information Science & Library Science

资金

Projekt DEAL

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In recent years, scholarly data sets have been used for various purposes, such as paper recommendation, citation recommendation, citation context analysis, and citation context-based document summarization. The evaluation of approaches to such tasks and their applicability in real-world scenarios heavily depend on the used data set. However, existing scholarly data sets are limited in several regards. In this paper, we propose a new data set based on all publications from all scientific disciplines available on arXiv.org. Apart from providing the papers' plain text, in-text citations were annotated via global identifiers. Furthermore, citing and cited publications were linked to the Microsoft Academic Graph, providing access to rich metadata. Our data set consists of over one million documents and 29.2 million citation contexts. The data set, which is made freely available for research purposes, not only can enhance the future evaluation of research paper-based and citation context-based approaches, but also serve as a basis for new ways to analyze in-text citations, as we show prototypically in this article.

unarXive: a large scholarly data set with publications' full-text, annotated in-text citations, and links to metadata

期刊

SCIENTOMETRICS

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

unarXive: a large scholarly data set with publications' full-text, annotated in-text citations, and links to metadata

期刊

SCIENTOMETRICS

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文