A Multimodal Aggregation Network With Serial Self-Attention Mechanism for Micro-Video Multi-Label Classification

Article Computer Science, Information Systems

Heterogeneous Hierarchical Feature Aggregation Network for Personalized Micro-Video Recommendation

Desheng Cai et al.

Summary: This paper proposes a novel Heterogeneous Hierarchical Feature Aggregation Network (HHFAN) for personalized micro-video recommendation. The network aims to explore the relationships among users, micro-videos, and related multi-modal information, and generate high-quality user and micro-video embeddings. Experimental results demonstrate that the proposed model outperforms baseline methods.

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Learning view-specific labels and label-feature dependence maximization for multi-view multi-label classification

Dawei Zhao et al.

Summary: This paper proposes a novel method for multi-view multi-label learning that enhances learning effectiveness by using view-specific labels and maximizing label-feature dependence. Experimental results show that the proposed method outperforms existing methods on several benchmark datasets.

APPLIED SOFT COMPUTING (2022)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Video Swin Transformer

Ze Liu et al.

Summary: This paper introduces a Transformer architecture with a bias towards locality in video recognition, achieving a better balance between speed and accuracy compared to global self-attention mechanisms; by adapting the Swin Transformer and leveraging pre-trained models, it achieves state-of-the-art accuracy on various video recognition benchmarks.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Add to Collection

Article Engineering, Electrical & Electronic

A Cross-Attention BERT-Based Framework for Continuous Sign Language Recognition

Zhenxing Zhou et al.

Summary: Continuous sign language recognition (CSLR) is a challenging task that utilizes multiple input modalities to improve recognition accuracy. However, the modality differences make it difficult to define an integrative framework. To address this, a novel deep learning framework called CA-SignBERT is proposed, which utilizes multiple BERT models and a special cross-attention mechanism to analyze information from different modalities.

IEEE SIGNAL PROCESSING LETTERS (2022)

Add to Collection

Article Engineering, Electrical & Electronic

Learning Social Relationship From Videos via Pre-Trained Multimodal Transformer

Yiyang Teng et al.

Summary: This study proposes a pre-trained multimodal feature learning framework that trains the model on unlabeled video data through self-supervised learning, and then applies it to social relationship recognition tasks. A multimodal instance interaction transformer is designed to capture interactions between visual and textual information, while pre-training ensures state-of-the-art results on a public benchmark.

IEEE SIGNAL PROCESSING LETTERS (2022)

Add to Collection

Article Engineering, Electrical & Electronic

Multi-Label Classification of Fundus Images With Graph Convolutional Network and Self-Supervised Learning

Jinke Lin et al.

Summary: This study focuses on the multi-label classification of fundus images, proposing two new multi-label classification networks based on graph convolutional network and self-supervised learning to enhance classification performance and generalization ability by capturing relevant information and learning unannotated data.

IEEE SIGNAL PROCESSING LETTERS (2021)

Add to Collection

Article Computer Science, Information Systems

Learning and Fusing Multiple User Interest Representations for Micro-Video and Movie Recommendations

Xusong Chen et al.

Summary: The paper focuses on learning and fusing multiple kinds of user interest representations, including latent representation, item-level representation, neighbor-assisted representation, and category-level representation. The proposed method is validated on two real-world video recommendation datasets, demonstrating significant performance improvement over existing state-of-the-art techniques.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

Add to Collection

Article Computer Science, Information Systems