☆ 4.6 Article

What does Chinese BERT learn about syntactic knowledge?

PEERJ COMPUTER SCIENCE (2023)

Journal

PEERJ COMPUTER SCIENCE

Volume 9, Issue -, Pages -

Publisher

PEERJ INC

DOI: 10.7717/peerj-cs.1478

Keywords

Chinese; BERT; Syntax; Fine-tune; NLP

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study used probing methods to identify syntactic knowledge in the attention heads and hidden states of Chinese BERT. The results showed that certain individual heads and combinations of heads performed well in encoding specific and overall syntactic relations, respectively. The hidden representations in each layer also contained varying degrees of syntactic information. The analysis of fine-tuned models for different tasks revealed changes in language structure conservation. These findings help explain the significant improvements achieved by Chinese BERT in various language-processing tasks.

Pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) have been applied to a wide range of natural language processing (NLP) tasks and obtained significantly positive results. A growing body of research has investigated the reason why BERT is so efficient and what language knowledge BERT is able to learn. However, most of these works focused almost exclusively on English. Few studies have explored the language information, particularly syntactic information, that BERT has learned in Chinese, which is written as sequences of characters. In this study, we adopted some probing methods for identifying syntactic knowledge stored in the attention heads and hidden states of Chinese BERT. The results suggest that some individual heads and combination of heads do well in encoding corresponding and overall syntactic relations, respectively. The hidden representation of each layer also contained syntactic information to different degrees. We also analyzed the fine-tuned models of Chinese BERT for different tasks, covering all levels. Our results suggest that these fine-turned models reflect changes in conserving language structure. These findings help explain why Chinese BERT can show such large improvements across many language-processing tasks.

What does Chinese BERT learn about syntactic knowledge?

Journal

PEERJ COMPUTER SCIENCE

Publisher

PEERJ INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

What does Chinese BERT learn about syntactic knowledge?

Journal

PEERJ COMPUTER SCIENCE

Publisher

PEERJ INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper