☆ 4.6 Article

A survey of transformer-based multimodal pre-trained modals

NEUROCOMPUTING (2023)

Journal

NEUROCOMPUTING

Volume 515, Issue -, Pages 89-106

Publisher

ELSEVIER

DOI: 10.1016/j.neucom.2022.09.136

Keywords

Transformer; Pre -trained model; Multimodal; Documet Layout

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper provides a comprehensive account of the opportunities and challenges of Transformer-based multimodal pre-trained models in various domains. It reviews representative tasks of multimodal AI applications and analyzes state-of-the-art Transformer-based multimodal models from different aspects. The paper concludes with key challenges in the field and suggests future research directions.

With the broad industrialization of Artificial Intelligence(AI), we observe a large fraction of real-world AI applications are multimodal in nature in terms of relevant data and ways of interaction. Pre-trained big models have been proven as the most effective framework for joint modeling of multi-modality data. This paper provides a thorough account of the opportunities and challenges of Transformer-based multimodal pre-trained model (PTM) in various domains. We begin by reviewing the representative tasks of multi -modal AI applications, ranging from vision-text and audio-text fusion to more complex tasks such as doc-ument layout understanding. We particularly address the new multi-modal research domain of document layout understanding. We further analyze and compare the state-of-the-art Transformer -based multimodal PTMs from multiple aspects, including downstream applications, datasets, input fea-ture embedding, and model architectures. In conclusion, we summarize the key challenges of this field and suggest several future research directions. (c) 2022 Elsevier B.V. All rights reserved.

A survey of transformer-based multimodal pre-trained modals

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A survey of transformer-based multimodal pre-trained modals

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper