4.5 Article

Credit distribution in relational scientific databases

期刊

INFORMATION SYSTEMS
卷 109, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.is.2022.102060

关键词

Data citation; Data credit; Provenance; Causality and responsibility; Shapley value

资金

  1. ExaMode project, as part of the European Union H2020 program [825292]

向作者/读者索取更多资源

Digital data is a fundamental research product, but the concept of data credit, which represents the importance of cited data, is still not well understood. This paper explores the problem of distributing credit to database parts responsible for producing data cited by a research entity. The authors use the IUPHAR/BPS Guide to Pharmacology as a case study and define three distribution strategies based on how-provenance, responsibility, and the Shapley value. The paper demonstrates how credit can serve as a bibliometric measure for data and their curators, highlighting frequently-used database areas and rewarding research impact.
Digital data is a basic form of research product for which citation, and the generation of credit or recognition for authors, are still not well understood. The notion of data credit has therefore recently emerged as a new measure, defined and based on data citation groundwork. Data credit is a real value representing the importance of data cited by a research entity. We can use credit to annotate data contained in a curated scientific database and then as a proxy of the significance and impact of that data in the research world. It is a method that, together with citations, helps recognize the value of data and its creators.In this paper, we explore the problem of Data Credit Distribution, the process by which credit is distributed to the database parts responsible for producing data being cited by a research entity.We adopt as use case the IUPHAR/BPS Guide to Pharmacology (GtoPdb), a widely-used curated scientific relational database. We focus on Select-Project-Join (SPJ) queries under bag semantics, and we define three distribution strategies based on how-provenance, responsibility, and the Shapley value.Using these distribution strategies, we show how credit can highlight frequently used database areas and how it can be used as a new bibliometric measure for data and their curators. In particular, credit rewards data and authors based on their research impact, not only on the citation count. We also show how these distribution strategies vary in their sensitivity to the role of an input tuple in the generation of the output data and reward input tuples differently.(c) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据