4.6 Article

A Multimodal Aggregation Network With Serial Self-Attention Mechanism for Micro-Video Multi-Label Classification

Related references

Note: Only part of the references are listed.
Article Computer Science, Information Systems

Heterogeneous Hierarchical Feature Aggregation Network for Personalized Micro-Video Recommendation

Desheng Cai et al.

Summary: This paper proposes a novel Heterogeneous Hierarchical Feature Aggregation Network (HHFAN) for personalized micro-video recommendation. The network aims to explore the relationships among users, micro-videos, and related multi-modal information, and generate high-quality user and micro-video embeddings. Experimental results demonstrate that the proposed model outperforms baseline methods.

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

Article Computer Science, Artificial Intelligence

Learning view-specific labels and label-feature dependence maximization for multi-view multi-label classification

Dawei Zhao et al.

Summary: This paper proposes a novel method for multi-view multi-label learning that enhances learning effectiveness by using view-specific labels and maximizing label-feature dependence. Experimental results show that the proposed method outperforms existing methods on several benchmark datasets.

APPLIED SOFT COMPUTING (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Video Swin Transformer

Ze Liu et al.

Summary: This paper introduces a Transformer architecture with a bias towards locality in video recognition, achieving a better balance between speed and accuracy compared to global self-attention mechanisms; by adapting the Swin Transformer and leveraging pre-trained models, it achieves state-of-the-art accuracy on various video recognition benchmarks.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) (2022)

Article Engineering, Electrical & Electronic

A Cross-Attention BERT-Based Framework for Continuous Sign Language Recognition

Zhenxing Zhou et al.

Summary: Continuous sign language recognition (CSLR) is a challenging task that utilizes multiple input modalities to improve recognition accuracy. However, the modality differences make it difficult to define an integrative framework. To address this, a novel deep learning framework called CA-SignBERT is proposed, which utilizes multiple BERT models and a special cross-attention mechanism to analyze information from different modalities.

IEEE SIGNAL PROCESSING LETTERS (2022)

Article Engineering, Electrical & Electronic

Learning Social Relationship From Videos via Pre-Trained Multimodal Transformer

Yiyang Teng et al.

Summary: This study proposes a pre-trained multimodal feature learning framework that trains the model on unlabeled video data through self-supervised learning, and then applies it to social relationship recognition tasks. A multimodal instance interaction transformer is designed to capture interactions between visual and textual information, while pre-training ensures state-of-the-art results on a public benchmark.

IEEE SIGNAL PROCESSING LETTERS (2022)

Article Engineering, Electrical & Electronic

Multi-Label Classification of Fundus Images With Graph Convolutional Network and Self-Supervised Learning

Jinke Lin et al.

Summary: This study focuses on the multi-label classification of fundus images, proposing two new multi-label classification networks based on graph convolutional network and self-supervised learning to enhance classification performance and generalization ability by capturing relevant information and learning unannotated data.

IEEE SIGNAL PROCESSING LETTERS (2021)

Article Computer Science, Information Systems

Learning and Fusing Multiple User Interest Representations for Micro-Video and Movie Recommendations

Xusong Chen et al.

Summary: The paper focuses on learning and fusing multiple kinds of user interest representations, including latent representation, item-level representation, neighbor-assisted representation, and category-level representation. The proposed method is validated on two real-world video recommendation datasets, demonstrating significant performance improvement over existing state-of-the-art techniques.

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

Article Computer Science, Information Systems

Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

Wei Liu et al.

MULTIMEDIA TOOLS AND APPLICATIONS (2020)

Proceedings Paper Computer Science, Information Systems

A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction

Jiayi Xie et al.

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) (2020)

Article Engineering, Electrical & Electronic

Modeling Label Dependencies for Audio Tagging With Graph Convolutional Network

Helin Wang et al.

IEEE SIGNAL PROCESSING LETTERS (2020)

Article Computer Science, Artificial Intelligence

Manifold regularized discriminative feature selection for multi-label learning

Jia Zhang et al.

PATTERN RECOGNITION (2019)

Article Computer Science, Artificial Intelligence

Multi-Label Learning with Global and Local Label Correlation

Yue Zhu et al.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2018)

Proceedings Paper Computer Science, Artificial Intelligence

Learning Spatiotemporal Features with 3D Convolutional Networks

Du Tran et al.

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2015)