Revealing the Technology Development of Natural Language Processing: A Scientific Entity-Centric Perspective

This study analyzes the development of technologies in Natural Language Processing (NLP) from an entity-centric perspective, extracting methods, datasets, metrics, and tools to measure their impact via co-occurrence networks. The research reveals that while pre-trained language models like BERT and Transformer have become mainstream, the average number of entities per paper is increasing, indicating a growing knowledge burden for researchers.

The study extracts technology-related entities from NLP articles and normalizes them using a semi-automatic approach to calculate z-scores based on co-occurrence networks.
Methods dominate among the 179 high-impact entities identified, with pre-trained language models such as BERT and Transformer becoming mainstream in recent years.
Unlike other method entities, the impact of the Wikipedia dataset and BLEU metric has continued to rise over the long term.
There is a remarkable surge in the popularity of new high-impact technologies, with their acceptance by researchers accelerating at an unprecedented speed.

This approach provides a more accurate analysis of technology development trends than coarse-grained thematic perspectives, highlighting how pre-trained models have injected new vitality into NLP innovation.