Related references
Note: Only part of the references are listed.HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering
Fei Liu et al.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) (2021)
AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
Madeleine Grunde-McLaughlin et al.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)
Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence
Amir Zadeh et al.
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) (2019)
REPAIR: Removing Representation Bias by Dataset Resampling
Yi Li et al.
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) (2019)
Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering
Zhou Yu et al.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2018)
Uncovering the Temporal Context for Video Question Answering
Linchao Zhu et al.
INTERNATIONAL JOURNAL OF COMPUTER VISION (2017)
MarioQA: Answering Questions by Watching Gameplay Videos
Jonghwan Mun et al.
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering
Tegan Maharaj et al.
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)
Dense-Captioning Events in Videos
Ranjay Krishna et al.
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Yunseok Jang et al.
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)