☆ 4.6 Article

Potential of natural language processing for metadata extraction fromenvironmental scientific publications

SOIL (2023)

期刊

SOIL

卷 9, 期 1, 页码 155-168

出版社

COPERNICUS GESELLSCHAFT MBH

DOI: 10.5194/soil-9-155-2023

关键词

类别

Soil Science

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Summarizing information from large bodies of scientific literature is an essential but work-intensive task. This study explores three NLP techniques (topic modeling, tailored regular expressions, and the shortest dependency path method) to support evidence synthesis tasks. The results show that all three tested NLP techniques are able to support this task and have the potential for automated updating as new publications become available.

Summarizing information from large bodies of scientific literature is anessential but work-intensive task. This is especially true in environmentalstudies where multiple factors (e.g., soil, climate, vegetation) cancontribute to the effects observed. Meta-analyses, studies thatquantitatively summarize findings of a large body of literature, rely onmanually curated databases built upon primary publications. However, giventhe increasing amount of literature, this manual work is likely to requiremore and more effort in the future. Natural language processing (NLP)facilitates this task, but it is not clear yet to which extent theextraction process is reliable or complete. In this work, we explore threeNLP techniques that can help support this task: topic modeling, tailoredregular expressions and the shortest dependency path method. We apply thesetechniques in a practical and reproducible workflow on two corpora ofdocuments: the Open Tension-diskInfiltrometer Meta-database (OTIM) and the Meta corpus. The OTIM corpus contains the sourcepublications of the entries of the OTIM database of near-saturated hydraulicconductivity from tension-disk infiltrometer measurements(https://github.com/climasoma/otim-db, last access: 1 March 2023). The Meta corpus is constituted ofall primary studies from 36 selected meta-analyses on the impact ofagricultural practices on sustainable water management in Europe. As a firststep of our practical workflow, we identified different topics from theindividual source publications of the Meta corpus using topic modeling.This enabled us to distinguish well-researched topics (e.g., conventionaltillage, cover crops), where meta-analysis would be useful, from neglectedtopics (e.g., effect of irrigation on soil properties), showing potentialknowledge gaps. Then, we used tailored regular expressions to extractcoordinates, soil texture, soil type, rainfall, disk diameter and tensionsfrom the OTIM corpus to build a quantitative database. We were able toretrieve the respective information with 56 % up to 100 % of allrelevant information (recall) and with a precision between 83 % and100 %. Finally, we extracted relationships between a set of driverscorresponding to different soil management practices or amendments (e.g.,biochar, zero tillage) and target variables (e.g., soilaggregate, hydraulic conductivity, crop yield) from thesource publications' abstracts of the Meta corpus using the shortestdependency path between them. These relationships were further classifiedaccording to positive, negative or absent correlations between the driverand the target variable. This quickly provided an overview of the differentdriver-variable relationships and their abundance for an entire body ofliterature. Overall, we found that all three tested NLP techniques were ableto support evidence synthesis tasks. While human supervision remainsessential, NLP methods have the potential to support automated evidencesynthesis which can be continuously updated as new publications becomeavailable.

Potential of natural language processing for metadata extraction fromenvironmental scientific publications

期刊

SOIL

出版社

COPERNICUS GESELLSCHAFT MBH

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Potential of natural language processing for metadata extraction fromenvironmental scientific publications

期刊

SOIL

出版社

COPERNICUS GESELLSCHAFT MBH

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文