☆ 4.6 Article

Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information

SENSORS (2023)

Journal

SENSORS

Volume 23, Issue 11, Pages -

Publisher

MDPI

DOI: 10.3390/s23115093

Keywords

hierarchical attention; multi-head attention; graph neural networks; feature diversity; information interaction

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Since the introduction of the Transformer model, it has had a significant impact on various fields of machine learning, including time series prediction. However, the existing multi-head attention mechanisms suffer from issues of feature redundancy and resource waste. To address these problems, this paper proposes a hierarchical attention mechanism and utilizes global feature aggregation using graph networks to enhance the information diversity and improve performance.

Since introducing the Transformer model, it has dramatically influenced various fields of machine learning. The field of time series prediction has also been significantly impacted, where Transformer family models have flourished, and many variants have been differentiated. These Transformer models mainly use attention mechanisms to implement feature extraction and multi-head attention mechanisms to enhance the strength of feature extraction. However, multi-head attention is essentially a simple superposition of the same attention, so they do not guarantee that the model can capture different features. Conversely, multi-head attention mechanisms may lead to much information redundancy and computational resource waste. In order to ensure that the Transformer can capture information from multiple perspectives and increase the diversity of its captured features, this paper proposes a hierarchical attention mechanism, for the first time, to improve the shortcomings of insufficient information diversity captured by the traditional multi-head attention mechanisms and the lack of information interaction among the heads. Additionally, global feature aggregation using graph networks is used to mitigate inductive bias. Finally, we conducted experiments on four benchmark datasets, and the experimental results show that the proposed model can outperform the baseline model in several metrics.

Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information

Journal

SENSORS

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Metaformer: A Transformer That Tends to Mine Metaphorical-Level Information

Journal

SENSORS

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper