Related references
Note: Only part of the references are listed.
Article
Computer Science, Software Engineering
Pufen Zhang et al.
Summary: A novel Multiple Context Learning Network (MCLN) is proposed for visual question answering (VQA), which models multiple contexts and learns comprehensive contexts. Different context learning modules are introduced to learn object and word contexts, and a deep context learning is achieved by stacking multiple context learning layers. The approach also introduces a contextualized text encoder based on pretrained BERT to enhance the textual context learning. The MCLN outperforms previous state-of-the-art models on benchmark datasets.
SCIENTIFIC PROGRAMMING
(2022)
Article
Computer Science, Information Systems
Abdulganiyu Abdu Yusuf et al.
Summary: Graph neural networks, especially graph convolution network (GCN), have been widely used for vision-to-language tasks, such as visual question answering (VQA), and have shown promising results in capturing spatial and semantic relationships. However, the application of GCN on different subtasks in VQA datasets can lead to varying results. Factors such as training and testing size, evaluation metrics, and hyperparameters also affect the VQA results. This study proposed a GCN framework for VQA that uses fine-tuned word representation to handle reasoning type questions, and the performance of the framework was evaluated using various measures. The results obtained from GQA and VQA 2.0 datasets slightly outperformed most existing methods.
MULTIMEDIA TOOLS AND APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Tianshan Liu et al.
Summary: The article proposes a method for recognizing group activities based on visual-semantic graph neural network and pose-position attentive learning. The method improves the recognition performance of group activities by constructing a bi-modal visual graph and a semantic graph, and utilizing pose and position information for attention aggregation.
Proceedings Paper
Computer Science, Artificial Intelligence
Yehao Li et al.
Summary: This paper proposes a new Transformer-style structure called COS-Net, which integrates semantic comprehension and ordering processes into a single architecture. By utilizing cross-modal retrieval and a semantic ranker, COS-Net achieves superior performance in image captioning tasks.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Chen Liang et al.
Summary: This paper proposes a new task and dataset for examining the abductive reasoning ability of machine intelligence in everyday visual situations. The authors design a strong baseline model and conduct experiments on a large-scale dataset, showing that the model outperforms many famous video-language models in the VAR task but still falls behind human performance.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022)
(2022)
Article
Computer Science, Information Systems
Yuhao Cheng et al.
Summary: Image-text retrieval is a fundamental task in cross-modal research. Existing methods can be classified into independent representation matching and cross-interaction matching. This article proposes a method called CGMN, which explores both intra- and inter-relations without introducing network interaction. The experiments show that CGMN outperforms state-of-the-art methods in image retrieval and is more efficient than interactive matching methods.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2022)
Article
Computer Science, Information Systems
Yibing Liu et al.
Summary: This article introduces a novel visual attention regularization method called AttReg for better visual grounding in Visual Question Answering (VQA). AttReg identifies essential but ignored image regions for question answering and leverages a mask-guided learning scheme to regularize the visual attention for better focus on these key regions. Extensive experiments have demonstrated the effectiveness of AttReg and it has achieved state-of-the-art accuracy on benchmark datasets.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2022)
Article
Computer Science, Information Systems
Xi Zhu et al.
Summary: This research achieves outstanding performance in the VQA task by proposing an object-difference based graph learner, combining with a soft-attention mechanism and Graph Convolutional Networks. Experimental results demonstrate that the model outperforms baseline methods on the VQA 2.0 dataset.
MULTIMEDIA TOOLS AND APPLICATIONS
(2021)
Article
Computer Science, Artificial Intelligence
Qingxing Cao et al.
Summary: Collaborative reasoning for image-question pairs is important in interpretable visual question answering systems, but current models heavily rely on annotations or rules, leading to either heavy workloads or poor performance. This paper introduces a novel neural network model, PTGRN, which performs global reasoning on a dependency tree parsed from the question, showing superiority over current VQA methods on relational datasets. Visualization results highlight the explainable capability of PTGRN.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2021)
Article
Computer Science, Artificial Intelligence
Liyang Zhang et al.
Summary: The new framework KAN utilizes object-related knowledge and a knowledge graph to assist in the reasoning process of VQA, with an attention module that adaptively balances the importance of external knowledge against detected objects. Extensive experiments demonstrate that KAN achieves state-of-the-art performance on challenging VQA datasets and provides benefits to VQA baselines.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2021)
Article
Computer Science, Artificial Intelligence
Himanshu Sharma et al.
Summary: Visual Question Answering (VQA) is an emerging research area in computer vision and natural language processing, aiming to predict answers to natural questions related to images. However, current VQA approaches often overlook the relationship and reasoning among regions of interest. The proposed VQA model introduced in this paper considers previously attended visual content, leading to improved accuracy in answer prediction.
IMAGE AND VISION COMPUTING
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Zhonghao Wang et al.
Summary: The study focuses on concept induction in visual reasoning, achieving an interpretable model through a new framework named OCCAM. By inducing concepts of objects and relations and imposing OCCAM on the induced symbolic concept space, the model achieves state-of-the-art performance without human-annotated functional programs on CLEVR and GQA datasets.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)
(2021)
Proceedings Paper
Computer Science, Artificial Intelligence
Qingxing Cao et al.
Summary: The study introduces a method using capsule networks and Linguistically Routing to improve generalization on out-of-distribution data in visual question answering and shows promising experimental results.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021)
(2021)
Proceedings Paper
Computer Science, Information Systems
Chen Qu et al.
Summary: This work addresses multi-modal information needs involving text questions and images by focusing on passage retrieval for outside-knowledge visual question answering. The study shows that dense retrieval significantly outperforms sparse retrieval, and that image captions are more informative than object names.
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
(2021)
Article
Computer Science, Artificial Intelligence
Weifeng Zhang et al.
INFORMATION FUSION
(2020)
Proceedings Paper
Computer Science, Artificial Intelligence
Ronghang Hu et al.
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019)
(2019)
Proceedings Paper
Computer Science, Artificial Intelligence
Zilong Zheng et al.
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019)
(2019)
Proceedings Paper
Computer Science, Software Engineering
Qiang Sun et al.
ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL
(2019)
Article
Computer Science, Artificial Intelligence
Qi Wu et al.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2018)
Article
Computer Science, Artificial Intelligence
Shaoqing Ren et al.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2017)
Proceedings Paper
Computer Science, Artificial Intelligence
Yash Goyal et al.
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017)
(2017)
Proceedings Paper
Computer Science, Artificial Intelligence
Abhishek Das et al.
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017)
(2017)
Proceedings Paper
Computer Science, Artificial Intelligence
Stanislaw Antol et al.
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
(2015)