4.6 Article

What does Chinese BERT learn about syntactic knowledge?

Journal

PEERJ COMPUTER SCIENCE
Volume 9, Issue -, Pages -

Publisher

PEERJ INC
DOI: 10.7717/peerj-cs.1478

Keywords

Chinese; BERT; Syntax; Fine-tune; NLP

Ask authors/readers for more resources

This study used probing methods to identify syntactic knowledge in the attention heads and hidden states of Chinese BERT. The results showed that certain individual heads and combinations of heads performed well in encoding specific and overall syntactic relations, respectively. The hidden representations in each layer also contained varying degrees of syntactic information. The analysis of fine-tuned models for different tasks revealed changes in language structure conservation. These findings help explain the significant improvements achieved by Chinese BERT in various language-processing tasks.
Pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) have been applied to a wide range of natural language processing (NLP) tasks and obtained significantly positive results. A growing body of research has investigated the reason why BERT is so efficient and what language knowledge BERT is able to learn. However, most of these works focused almost exclusively on English. Few studies have explored the language information, particularly syntactic information, that BERT has learned in Chinese, which is written as sequences of characters. In this study, we adopted some probing methods for identifying syntactic knowledge stored in the attention heads and hidden states of Chinese BERT. The results suggest that some individual heads and combination of heads do well in encoding corresponding and overall syntactic relations, respectively. The hidden representation of each layer also contained syntactic information to different degrees. We also analyzed the fine-tuned models of Chinese BERT for different tasks, covering all levels. Our results suggest that these fine-turned models reflect changes in conserving language structure. These findings help explain why Chinese BERT can show such large improvements across many language-processing tasks.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available