Related references
Note: Only part of the references are listed.Recurrent Attention Network with Reinforced Generator for Visual Dialog
Hehe Fan et al.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2020)
Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
Dong Li et al.
IEEE TRANSACTIONS ON MULTIMEDIA (2019)
COCO-CN for Cross-Lingual Image Tagging, Captioning, and Retrieval
Xirong Li et al.
IEEE TRANSACTIONS ON MULTIMEDIA (2019)
Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining
Yundong Zhang et al.
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) (2019)
BTDP: Toward Sparse Fusion with Block Term Decomposition Pooling for Visual Question Answering
Zhiwei Fang et al.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2019)
Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Yangyang Guo et al.
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19) (2019)
Scale-Aware Fast R-CNN for Pedestrian Detection
Jianan Li et al.
IEEE TRANSACTIONS ON MULTIMEDIA (2018)
Visual question answering: A survey of methods and datasets
Qi Wu et al.
COMPUTER VISION AND IMAGE UNDERSTANDING (2017)
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal et al.
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)
VQA: Visual Question Answering
Stanislaw Antol et al.
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2015)