4.5 Article

Achieving Human Parity on Visual Question Answering

Related references

Note: Only part of the references are listed.
Proceedings Paper Computer Science, Artificial Intelligence

High-Dimensional Sparse Cross-Modal Hashing with Fine-Grained Similarity Embedding

Yongxin Wang et al.

Summary: This study introduces an efficient sparse hashing method for cross-modal retrieval tasks, achieving superior performance compared to state-of-the-art approaches. By properly utilizing sparse coding and discrete optimization algorithms, the method reduces quantization errors and improves the discriminative power of hash codes. Experimental results demonstrate the efficiency and effectiveness of the proposed high-dimensional sparse cross-modal hashing approach.

PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) (2021)

Proceedings Paper Computer Science, Information Systems

GilBERT: Generative Vision-Language Pre-Training for Image-Text Retrieval

Weixiang Hong et al.

Summary: The proposed GilBERT is a generative visual-linguistic pre-training approach that learns generic representations of image-text data and completes missing modalities for incomplete pairs. In the testing phase, GilBERT facilitates efficient vector-based retrieval by providing unified feature embeddings for queries and database items. The generative training enables GilBERT to model image-text relationships without massive randomly-sampled negative samples, leading to superior experimental performances.

SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (2021)

Proceedings Paper Computer Science, Information Systems

Answering Any-hop Open-domain Questions with Iterative Document Reranking

Yuyu Zhang et al.

Summary: This study introduces a unified QA framework that can answer any-hop open-domain questions by iteratively retrieving, reranking, and filtering documents, and adaptively determining when to stop the retrieval process to improve retrieval accuracy. Additionally, the use of a graph-based reranking model enables the method to perform well on both single-hop and multi-hop open-domain QA datasets.

SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (2021)

Article Computer Science, Information Systems

Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval

Miaomiao Cheng et al.

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2020)

Proceedings Paper Computer Science, Information Systems

Training Curricula for Open Domain Answer Re-Ranking

Sean MacAvaney et al.

PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20) (2020)

Proceedings Paper Computer Science, Information Systems

Document Gated Reader for Open-Domain Question Answering

Bingning Wang et al.

PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19) (2019)

Proceedings Paper Computer Science, Information Systems

Human Behavior Inspired Machine Reading Comprehension

Yukun Zheng et al.

PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19) (2019)

Article Computer Science, Artificial Intelligence

VQA: Visual Question Answering

Aishwarya Agrawal et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2017)

Article Computer Science, Artificial Intelligence

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Ranjay Krishna et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2017)

Article Multidisciplinary Sciences

Mastering the game of Go with deep neural networks and tree search

David Silver et al.

NATURE (2016)

Article Computer Science, Artificial Intelligence

Building Watson: An Overview of the DeepQA Project

David Ferrucci et al.

AI MAGAZINE (2010)