4.5 Article

Improving cross-lingual language understanding with consistency regularization-based fine-tuning

Journal

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s13042-023-01854-1

Keywords

Cross-lingual; Consistency regularization; Data augmentation; Few-shot

Ask authors/readers for more resources

This study proposes a method of improving cross-lingual language understanding through consistency regularization-based fine-tuning. By penalizing prediction sensitivity to different data augmentations, the method enables task-specific supervision transfer between languages. Experimental results demonstrate significant improvements in various cross-lingual language understanding tasks.
Fine-tuning pre-trained cross-lingual language models alleviates the need for annotated data in different languages, as it allows the models to transfer task-specific supervision between languages, especially from high- to low-resource languages. In this work, we propose to improve cross-lingual language understanding with consistency regularization-based fine-tuning. Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations, i.e., subword sampling, Gaussian noise, code-switch substitution, and machine translation. In addition, we employ model consistency to regularize the models trained with two augmented versions of the same training set. Experimental results on the XTREME benchmark show that our method (the code is available at https://github.com/bozheng-hit/xTune) achieves significant improvements across various cross-lingual language understanding tasks, including text classification, question answering, and sequence labeling. Furthermore, we extend our method to the few-shot cross-lingual transfer setting, particularly considering a more realistic setting where machine translation systems are available. Meanwhile, machine translation as data augmentation can be well combined with our consistency regularization method. Experimental results demonstrate that our method also benefits the few-shot scenario.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available