Journal
INTERNATIONAL JOURNAL OF COMPUTER VISION
Volume 123, Issue 1, Pages 4-31Publisher
SPRINGER
DOI: 10.1007/s11263-016-0966-6
Keywords
Visual Question Answering
Categories
Funding
- Paul G. Allen Family Foundation
- National Science Foundation CAREER award
- Army Research Office YIP Award
- Office of Naval Research grant
- ICTAS at Virginia Tech
- Google Faculty Research Awards
- Direct For Computer & Info Scie & Enginr
- Div Of Information & Intelligent Systems [1661374] Funding Source: National Science Foundation
Ask authors/readers for more resources
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing similar to 0.25 M images, similar to 0.76 M questions, and similar to 10 M answers (www.visuaiqa.org) and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available