4.5 Article

Bold driver and static restart fused adaptive momentum for visual question answering

Journal

KNOWLEDGE AND INFORMATION SYSTEMS
Volume 65, Issue 2, Pages 921-943

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s10115-022-01775-5

Keywords

Visual question answering (VQA); Momentum; Stacked attention networks (SANs); Learning rate adaptation (LRA); Bold driver; Static restart

Ask authors/readers for more resources

Stacked attention networks (SANs) are a classic model for visual question answering (VQA) and have effectively promoted research progress in VQA. This paper proposes a method called bold driver and static restart fused adaptive momentum (BDSRM) to optimize SANs, by fusing bold driver and static restart (BDSR) into momentum. The experiments demonstrate that BDSRM outperforms state-of-the-art optimization algorithms on SANs.
Stacked attention networks (SANs) are one of the most classic models for visual question answering (VQA) and have effectively promoted the research progress of VQA. Existing literature utilized momentum to optimize SANs and obtained impressive results. However, error analysis shows that the fixed global learning rate in momentum makes it easy to fall into local optimal solution. Many Learning Rate Adaptation algorithms (LRA) (e.g., static restart, bold driver) are proposed to solve the issue by adjusting global learning rate. However, these algorithms still have many defects. For example, static restart has too high restart learning rate and the blindness of adaptive global learning rate; although bold driver can solve the blindness, it has the improper setting of adaptive parameters. To solve these issues, we fuse bold driver and static restart (BDSR) into momentum to devise our method called bold driver and static restart fused adaptive momentum (BDSRM). Then, we analyze its optimization process and time complexity and conduct quantitative experiments on VQAv1, Cifar-10 and similar models to verify that our BDSRM outperforms the state-of-the-art optimization algorithms on SANs. Afterward, we perform ablation experiments and visualization experiments to verify that our BDSR has preferable effectiveness.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available