4.5 Article

Chinese Named Entity Recognition in the Geoscience Domain Based on BERT

Journal

EARTH AND SPACE SCIENCE
Volume 9, Issue 3, Pages -

Publisher

AMER GEOPHYSICAL UNION
DOI: 10.1029/2021EA002166

Keywords

-

Funding

  1. National Natural Science Foundation of China [42050101, 41871311, 41871305]
  2. China Postdoctoral Science Foundation [2021M702991]
  3. Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing [KLIGIP-2021A01]
  4. Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) [CUG2106116]
  5. Wuhan Multi-Element Urban Geological Survey Demonstration Project [WHDYS-2020-004]

Ask authors/readers for more resources

Geological named entity recognition is crucial in information extraction and knowledge discovery. The proposed BERT-BiGRU-CRF model is specifically designed to address linguistic irregularities in geological text, offering an alternative approach for further study.
Geological reports are frequently used by geologists involved in geological surveys and scientific research to record the results and outcomes of geological surveys. With such a rich data source, a substantial amount of knowledge has yet to be mined and analyzed. This paper focuses on automatically information extraction from geological reports, namely, geological named entity recognition. Geological named entity recognition has an important role in data mining, knowledge discovery and Knowledge graph construction. Existing general named entity recognition models/tools are limited in the domain of geoscience due to the various language irregularities associated with geological text, such as informal sentence structures, several domain-geoscience words, large character lengths and multiple combinations of independent words. We present Bidirectional encoder representations from transformers (BERT)-(Bidirectional gated recurrent unit network) BiGRU- (Conditional random field) CRF, which is a deep learning-based geological named entity recognition model that is designed specifically with these linguistic irregularities in mind. Based on the pretrained language model, an integrated deep learning model incorporating BERT, BiGRU and CRF is constructed to obtain character vectors rich in semantic information through the BERT pretrained language model to alleviate for the lack of specificity of static word vectors (e.g., word2vec) and to improve the extraction capability of complex geological entities. We demonstrate our proposed model by applying it to four test datasets, including a geoscience NER data set from regional geological reports, and by comparing its performance with those of five baseline models. Plain Language Summary Geological named entity recognition has an important role in information extraction and knowledge discovery. This paper presents BERT-BiGRU-CRF, which is a deep learning-based geological named entity recognition model that is designed specifically with these linguistic irregularities in mind. We hope that our approach will serve as an alternative method that deserves further study.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available