4.7 Article Data Paper

Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature

期刊

GIGASCIENCE
卷 10, 期 12, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/gigascience/giab084

关键词

data citation; nucleotide sequence data; Europe PMC; European Nucleotide Archive; text mining; Convention on Biological Diversity; digital sequence information

资金

  1. German Federal Ministry of Education and Research (BMBF) [FKZ 031B0862]

向作者/读者索取更多资源

Linking nucleotide sequence data to scientific publication citations enhances understanding of data provenance, usage, and global trends in scientific knowledge gain. Data quality review, best practice recommendations for citation extraction, and construction of a data warehouse enable exploration of NSD use and statistics. This global provision and use of NSD allows scientists worldwide to engage with literature and sequence databases in a multidimensional manner.
Background: Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSD provenance, scientific use, and reuse in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, it becomes possible to assess the contribution of NSD to infer trends in scientific knowledge gain at the global level. Findings: We extracted and linked records from the European Nucleotide Archive to citations in open-access publications aggregated at Europe PubMed Central. A total of 8,464,292 ENA accessions with geographical provenance information were associated with publications. We conducted a data quality review to uncover potential issues in publication citation information extraction and author affiliation tagging and developed and implemented best-practice recommendations for citation extraction. We constructed flat data tables and a data warehouse with an interactive web application to enable ad hoc exploration of NSD use and summary statistics. Conclusions: The extraction and linking of NSD with associated publication citations enables transparency. The quality review contributes to enhanced text mining methods for identifier extraction and use. Furthermore, the global provision and use of NSD enable scientists worldwide to join literature and sequence databases in a multidimensional fashion. As a concrete use case, we visualized statistics of country clusters concerning NSD access in the context of discussions around digital sequence information under the United Nations Convention on Biological Diversity.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据