4.7 Article

An empirical analysis of the shift and scale parameters in BatchNorm

期刊

INFORMATION SCIENCES
卷 637, 期 -, 页码 -

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2023.118951

关键词

-

向作者/读者索取更多资源

BatchNorm is shown to improve the training of deep neural networks, especially Convolutional Neural Networks (CNN). The study presents two new optimizers, AffineLayer and BatchNorm-minus, to compare with standard BatchNorm and the case without batch normalization. The research provides empirical evidence that the success of BatchNorm may come primarily from improved weight initialization.
Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks, especially Convolutional Neural Networks (CNN). It has been empirically demonstrated that BatchNorm increases performance, stability, and accuracy, although the reasons for such improvements are unclear. BatchNorm includes a normalization step as well as trainable shift and scale parameters. In this paper, we empirically examine the relative contribution to the success of BatchNorm of the normalization step, as compared to the re-parameterization via shifting and scaling. To conduct our experiments, we implement two new optimizers in PyTorch, namely, a version of BatchNorm that we refer to as AffineLayer, which includes the re-parameterization step without normalization, and a version with just the normalization step, that we call BatchNorm-minus. We compare the performance of our AffineLayer and BatchNorm-minus implementations to standard BatchNorm, and we also compare these to the case where no batch normalization is used. We experiment with four ResNet architectures (ResNet18, ResNet34, ResNet50, and ResNet101) over a standard image dataset and multiple batch sizes. Among other findings, we provide empirical evidence that the success of BatchNorm may derive primarily from improved weight initialization.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据