4.6 Article

A survey of transformer-based multimodal pre-trained modals

Journal

NEUROCOMPUTING
Volume 515, Issue -, Pages 89-106

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2022.09.136

Keywords

Transformer; Pre -trained model; Multimodal; Documet Layout

Ask authors/readers for more resources

This paper provides a comprehensive account of the opportunities and challenges of Transformer-based multimodal pre-trained models in various domains. It reviews representative tasks of multimodal AI applications and analyzes state-of-the-art Transformer-based multimodal models from different aspects. The paper concludes with key challenges in the field and suggests future research directions.
With the broad industrialization of Artificial Intelligence(AI), we observe a large fraction of real-world AI applications are multimodal in nature in terms of relevant data and ways of interaction. Pre-trained big models have been proven as the most effective framework for joint modeling of multi-modality data. This paper provides a thorough account of the opportunities and challenges of Transformer-based multimodal pre-trained model (PTM) in various domains. We begin by reviewing the representative tasks of multi -modal AI applications, ranging from vision-text and audio-text fusion to more complex tasks such as doc-ument layout understanding. We particularly address the new multi-modal research domain of document layout understanding. We further analyze and compare the state-of-the-art Transformer -based multimodal PTMs from multiple aspects, including downstream applications, datasets, input fea-ture embedding, and model architectures. In conclusion, we summarize the key challenges of this field and suggest several future research directions. (c) 2022 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available