Journal
IEEE ACCESS
Volume 11, Issue -, Pages 80624-80646Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2023.3299877
Keywords
Task analysis; Data models; Deep learning; Transformers; Visualization; Training; Surveys; Question answering (information retrieval); Image segmentation; Image texture analysis; Attention mechanism; data fusion; multimodal learning; vision-language classification; vision-language question-answering; vision-language segmentation
Ask authors/readers for more resources
This paper discusses attention-based deep learning approaches on vision-language multimodal data, including models, performances, and evaluation metrics. A comprehensive review was conducted on 75 articles from 2015 to 2022, discussing current tasks, datasets, application areas, and future directions.
Multimodal learning has gained immense popularity due to the explosive growth in the volume of image and textual data in various domains. Vision-language heterogeneous multimodal data has been utilized to solve a variety of tasks including classification, image segmentation, image captioning, question-answering, etc. Consequently, several attention mechanism-based approaches with deep learning have been proposed on image-text multimodal data. In this paper, we highlight the current status of attention-based deep learning approaches on vision-language multimodal data by presenting a detailed description of the existing models, their performances and the variety of evaluation metrics used therein. We revisited the various attention mechanisms on image-text multimodal data since its inception in 2015 till 2022 and considered a total of 75 articles for the survey. Our comprehensive discussion also encompasses the current tasks, datasets, application areas and future directions in this domain. This is the very first attempt to discuss the vast scope of attention-based deep learning mechanisms on image-text multimodal data.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available