☆ 4.7 Article

DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering

IEEE TRANSACTIONS ON IMAGE PROCESSING (2023)

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Volume 32, Issue -, Pages 4812-4827

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIP.2023.3306910

Keywords

Cognition; Visualization; Task analysis; Question answering (information retrieval); Semantics; Geometry; Routing; Diagram understanding; visual reasoning; diagram question answering

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a Disentangled Adaptive Visual Reasoning Network (DisAVR) for Diagram Question Answering (DQA), which addresses the challenges of diagram representation and reasoning. DisAVR consists of improved region feature learning, question parsing, and disentangled adaptive reasoning modules. Experimental results demonstrate the superiority of DisAVR.

Diagram Question Answering (DQA) aims to correctly answer questions about given diagrams, which demands an interplay of good diagram understanding and effective reasoning. However, the same appearance of objects in diagrams can express different semantics. This kind of visual semantic ambiguity problem makes it challenging to represent diagrams sufficiently for better understanding. Moreover, since there are questions about diagrams from different perspectives, it is also crucial to perform flexible and adaptive reasoning on content-rich diagrams. In this paper, we propose a Disentangled Adaptive Visual Reasoning Network for DQA, named DisAVR, to jointly optimize the dual-process of representation and reasoning. DisAVR mainly comprises three modules: improved region feature learning, question parsing, and disentangled adaptive reasoning. Specifically, the improved region feature learning module is designed to first learn robust diagram representation by integrating detail-aware patch features and semantically-explicit text features with region features. Subsequently, the question parsing module decomposes the question into three types of question guidance including region, spatial relation and semantic relation guidance to dynamically guide subsequent reasoning. Next, the disentangled adaptive reasoning module decomposes the whole reasoning process by employing three visual reasoning cells to construct a soft fully-connected multi-layer stacked routing space. These three cells in each layer reason over object regions, semantic and spatial relations in the diagram under the corresponding question guidance. Moreover, an adaptive routing mechanism is designed to flexibly explore more optimal reasoning paths for specific diagram-question pairs. Extensive experiments on three DQA datasets demonstrate the superiority of our DisAVR.

DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper