相关参考文献
注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article
Computer Science, Artificial Intelligence
Zheng Zhang et al.
Summary: In this paper, we propose a novel Modality-Invariant Asymmetric Networks (MIAN) architecture that explores the asymmetric intra- and inter-modal similarity preservation under a probabilistic modality alignment framework. The MIAN approach outperforms the state-of-the-art cross-modal hashing methods in terms of performance. The proposed approach incorporates pairwise, piecewise, and transformed semantics into a unified semantic-preserving hash codes learning scheme.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2023)
Article
Computer Science, Artificial Intelligence
Deyin Liu et al.
Summary: Person image generation conditioned on natural language allows us to personalize image editing in a user-friendly manner. We propose a novel pose-guided multi-granularity attention architecture to synthesize person images. By incorporating sentence-level description and pose feature maps, we generate a coarse person image and further enhance it by drawing human body parts with highly correlated textual nouns and determining the spatial positions with respect to target pose points.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2023)
Article
Computer Science, Artificial Intelligence
Mingchen Zhuge et al.
Summary: Although current salient object detection (SOD) works have made significant progress, they are limited in preserving the integrity of predicted salient regions. To address this issue, a novel Integrity Cognition Network (ICON) is proposed, which leverages diverse feature aggregation, integrity channel enhancement, and part-whole verification to learn strong integrity features. Experimental results on seven benchmarks demonstrate that ICON outperforms baseline methods and achieves around 10% improvement in average false negative ratio (FNR). The code and results are available at: https://github.com/mczhuge/ICON.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Kaiming He et al.
Summary: This paper presents a self-supervised learning method for computer vision based on masked autoencoders. By masking a portion of the input image and reconstructing the missing pixels, large models can be trained efficiently and effectively. The approach achieves high generalization performance and outperforms supervised pretraining in transfer learning tasks.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Sonam Goenka et al.
Summary: This study proposes a new vision-language transformer based model, FashionVLP, for fashion image retrieval. The model utilizes prior knowledge from large image-text corpora and combines visual information from multiple levels of context to effectively capture fashion-related information. The results show that FashionVLP achieves state-of-the-art performance on benchmark datasets and a significant improvement on the challenging FashionlQ dataset with complex natural language feedback.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
(2022)
Article
Automation & Control Systems
Tian-Xiang Sun et al.
Summary: In recent years, with the development of deep learning, modeling for natural language processing tasks has converged into several mainstream paradigms. However, influenced by the rapid progress of pre-trained language models, paradigm shift has become a trend and achieved success in many tasks. Some paradigms also show potential to unify a large number of NLP tasks.
MACHINE INTELLIGENCE RESEARCH
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Junyang Lin et al.
Summary: This work introduces the largest dataset for pretraining in Chinese and proposes a method called M6 for multimodal pretraining, achieving significant success in various domains.
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Xudong Lin et al.
Summary: VX2TEXT is a text generation framework that converts multimodal inputs (video, text, speech, audio) into language embeddings for fusion, enabling direct application to video-based text generation tasks. It outperforms existing models on video-based text generation tasks and is conceptually simple and effective.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Mingchen Zhuge et al.
Summary: Kaleido-BERT is a novel vision-language pre-training model that utilizes a unique kaleido strategy to enhance cross-modality representations. By conducting self-supervised training on five different tasks and achieving significant improvements on four downstream tasks, it demonstrates remarkable potential for real-world applications.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021
(2021)
Proceedings Paper
Computer Science, Information Systems
Dehong Gao et al.
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20)
(2020)
Proceedings Paper
Computer Science, Artificial Intelligence
Xinyang Yi et al.
RECSYS 2019: 13TH ACM CONFERENCE ON RECOMMENDER SYSTEMS
(2019)
Proceedings Paper
Computer Science, Artificial Intelligence
Zhenxing Niu et al.
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
(2017)