☆ 4.6 Review

A review on the attention mechanism of deep learning

NEUROCOMPUTING (2021)

Journal

NEUROCOMPUTING

Volume 452, Issue -, Pages 48-62

Publisher

ELSEVIER

DOI: 10.1016/j.neucom.2021.03.091

Keywords

Attention mechanism; Deep learning; Recurrent Neural Network (RNN); Convolutional Neural Network (CNN); Encoder-decoder; Unified attention model; Computer vision applications; Natural language processing applications

Funding

National Key Research and Development Program of China [2018AAA0100400]
Joint Fund of the Equipments PreResearch and Ministry of Education of China [6141A020337]
Natural Science Foundation of Shandong Province [ZR2020MF131]
Science and Technology Program of Qingdao [2114ny19nsh]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper provides an overview of state-of-the-art attention models and defines a unified model suitable for most attention structures. It describes in detail each step of the attention mechanism implemented in the model and classifies existing attention models based on four criteria. Additionally, it summarizes the use of attention mechanisms in network architectures and typical applications.

Attention has arguably become one of the most important concepts in the deep learning field. It is inspired by the biological systems of humans that tend to focus on the distinctive parts when processing large amounts of information. With the development of deep neural networks, attention mechanism has been widely used in diverse application domains. This paper aims to give an overview of the state-of-theart attention models proposed in recent years. Toward a better general understanding of attention mechanisms, we define a unified model that is suitable for most attention structures. Each step of the attention mechanism implemented in the model is described in detail. Furthermore, we classify existing attention models according to four criteria: the softness of attention, forms of input feature, input representation, and output representation. Besides, we summarize network architectures used in conjunction with the attention mechanism and describe some typical applications of attention mechanism. Finally, we discuss the interpretability that attention brings to deep learning and present its potential future trends. (c) 2021 Elsevier B.V. All rights reserved.

A review on the attention mechanism of deep learning

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A review on the attention mechanism of deep learning

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper