4.7 Article

Revealing the technology development of natural language processing: A Scientific entity-centric perspective

Journal

INFORMATION PROCESSING & MANAGEMENT
Volume 61, Issue 1, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2023.103574

Keywords

Technology development; Automatic scientific entity identification; Entity co -occurrence network; z -score metric

Ask authors/readers for more resources

This study provides an accurate analysis of technology development in the field of Natural Language Processing (NLP) from an entity-centric perspective. The findings indicate an increase in the average number of entities per paper, with pre-trained language models becoming mainstream and the impact of Wikipedia dataset and BLEU metric continuing to rise. There has been a surge in popularity for new high-impact technologies in recent years, with researchers accepting them at an unprecedented speed.
Most studies on technology development have been conducted from a thematic perspective, but the topics are coarse-grained and insufficient to accurately represent technology. The development of automatic entity recognition techniques makes it possible to extract technology-related entities on a large scale. Thus, we perform a more accurate analysis of technology development from an entity-centric perspective. To begin with, we extract technology-related entities such as methods, datasets, metrics, and tools in articles on Natural Language Processing (NLP), and we apply a semi-automatic approach to normalize the entities. Subsequently, we calculate the z-scores of entities based on their co-occurrence networks to measure their impact. We then analyze the development trends of new technologies in the NLP domain since the beginning of the 21st century. The findings of this paper include three aspects: Firstly, the continued increase in the average number of entities per paper implies a growing burden on researchers to acquire relevant technical background knowledge. However, the emergence of pre-trained language models has injected new vitality into the technological innovation of the NLP domain. Secondly, Methods dominate among the 179 high-impact entities. An analysis of the z-score trend about the top 10 entities reveals that pre-trained language models, exemplified by BERT and Transformer, have become mainstream in recent years. Unlike the trend of the other eight method entities, the impact of Wikipedia dataset and BLEU metric has continued to rise in the long term. Thirdly, in recent years, there has been a remarkable surge in popularity for new high-impact technologies than ever before, and their acceptance by researchers has accelerated at an unprecedented speed. Our study provides a new perspective on analyzing technology development in a specific domain. This work can help researchers grasp the current hot technologies in the field of NLP and contribute to the future development of NLP technologies. The source code and dataset for this paper can be accessed at https://github.com/ZH-heng/technology_development.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available