☆ 3.8 Proceedings Paper

M6: Multi-Modality-to-Multi-Modality Multitask Mega-transformer for Unified Pretraining

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING (2021)

期刊

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING

卷 -, 期 -, 页码 3251-3261

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3447548.3467206

关键词

Multi-modal pretraining; Large-scale pretraining; Cross-modal understanding and generation

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This work introduces the largest dataset for pretraining in Chinese and proposes a method called M6 for multimodal pretraining, achieving significant success in various domains.

Multimodal pretraining has demonstrated success in the downstream tasks of cross-modal representation learning. However, it is limited to the English data, and there is still a lack of large-scale dataset for multimodal pretraining in Chinese. In this work, we propose the largest dataset for pretraining in Chinese, which consists of over 1.9TB images and 292GB texts. The dataset has large coverage over domains, including encyclopedia, question answering, forum discussion, etc. Besides, we propose a method called M6, referring to Multi-Modality-to-Multi-Modality Multitask Mega-transformer, for unified pretraining on the data of single modality and multiple modalities. The model is pretrained with our proposed tasks, including text-to-text transfer, image-to-text transfer, as well as multi-modality-to-text transfer. The tasks endow the model with strong capability of understanding and generation. We scale the model to 10 billion parameters, and build the largest pretrained model in Chinese. Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities, and the 10B-parameter pretrained model demonstrates strong potential in the setting of zero-shot learning.

M6: Multi-Modality-to-Multi-Modality Multitask Mega-transformer for Unified Pretraining

期刊

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

M6: Multi-Modality-to-Multi-Modality Multitask Mega-transformer for Unified Pretraining

期刊

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文