3.8 Proceedings Paper

M6: Multi-Modality-to-Multi-Modality Multitask Mega-transformer for Unified Pretraining

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3447548.3467206

关键词

Multi-modal pretraining; Large-scale pretraining; Cross-modal understanding and generation

向作者/读者索取更多资源

This work introduces the largest dataset for pretraining in Chinese and proposes a method called M6 for multimodal pretraining, achieving significant success in various domains.
Multimodal pretraining has demonstrated success in the downstream tasks of cross-modal representation learning. However, it is limited to the English data, and there is still a lack of large-scale dataset for multimodal pretraining in Chinese. In this work, we propose the largest dataset for pretraining in Chinese, which consists of over 1.9TB images and 292GB texts. The dataset has large coverage over domains, including encyclopedia, question answering, forum discussion, etc. Besides, we propose a method called M6, referring to Multi-Modality-to-Multi-Modality Multitask Mega-transformer, for unified pretraining on the data of single modality and multiple modalities. The model is pretrained with our proposed tasks, including text-to-text transfer, image-to-text transfer, as well as multi-modality-to-text transfer. The tasks endow the model with strong capability of understanding and generation. We scale the model to 10 billion parameters, and build the largest pretrained model in Chinese. Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities, and the 10B-parameter pretrained model demonstrates strong potential in the setting of zero-shot learning.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据