☆ 4.7 Article

Multimodal Affective Computing With Dense Fusion Transformer for Inter- and Intra-Modality Interactions

IEEE TRANSACTIONS ON MULTIMEDIA (2023)

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Volume 25, Issue -, Pages 6575-6587

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2022.3211197

Keywords

Transformers; Feature extraction; Discrete Fourier transforms; Computational modeling; Affective computing; Visualization; Fuses; Multimodal emotion recognition; multimodal fusion; multimodal representation learning; multimodal sentiment analysis

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a dense fusion transformer (DFT) framework for integrating textual, acoustic, and visual information for multimodal affective computing. DFT utilizes a modality-shared transformer (MT) module to extract modality-shared features and fuses sequential features of multiple modalities through dense fusion blocks, achieving affective predictions with a transformer.

This paper proposes a dense fusion transformer (DFT) framework to integrate textual, acoustic, and visual information for multimodal affective computing. DFT exploits a modality-shared transformer (MT) module to extract the modality-shared features by modelling unimodal, bimodal, and trimodal interactions jointly. MT constructs a series of dense fusion blocks to fuse utterance-level sequential features of the multiple modalities from the perspectives of low-level and high-level semantics. In particular, MT adopts local and global transformers to learn modality-shared representations by modelling inter- and intra-modality interactions. Furthermore, we devise a modality-specific representation (MR) module with a soft orthogonality constraint to penalize the distance between modality-specific and modality-shared representations, which are fused by a transformer to make affective predictions. Extensive experiments conducted on five public benchmark datasets show that DFT outperforms the state-of-the-art baselines.

Multimodal Affective Computing With Dense Fusion Transformer for Inter- and Intra-Modality Interactions

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Multimodal Affective Computing With Dense Fusion Transformer for Inter- and Intra-Modality Interactions

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper