期刊
INTELLIGENT COMPUTING, VOL 2
卷 507, 期 -, 页码 88-105出版社
SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-031-10464-0_7
关键词
State of the Art; Imagenet; Papers with code; Transformers; Convolutional neural networks
This article reviews the methods used in the top 40 highest accuracy models on the ILSVRC 2012 Imagenet validation set. Many of these models utilize transformer-based architectures, although none of them are naive self-attention transformers. Instead, the reviewed works explore different ways to combine the global nature of self-attention with the local nature of fine-grained image features, traditionally the strength of convolutional neural networks.
We present a review of the methods behind the top 40 highest accuracies achieved on the ILSVRC 2012 Imagenet validation set as ranked on Papers with Code. A significant proportion of these methods involve using transformer-based architectures, but it should be noted that none of the methods are naive self-attention transformers, which would be unmanageably large if the tokens were derived on a per-pixel basis. Rather, the works we review here all toil with different methods of combining the global nature of self-attention with the local nature of fine-grained image features, which have historically been the strength of convolutional neural networks. However, it should be noted that 9 out of 22 works reviewed did NOT use transformers.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据