4.6 Article

Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks

Journal

IEEE ACCESS
Volume 8, Issue -, Pages 52588-52608

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.2981072

Keywords

Deep neural networks; hyperparameter optimization; Bayesian optimization; diversification; early termination; parallelization; cost function transformation

Funding

  1. National Research Foundation of Korea (NRF) - Korean Government (MSIT) [NRF-2017R1E1A1A03070560]
  2. Electronics and Telecommunications Research Institute (ETRI) - Korean Government (Distributed Intelligence Core Technology of Hyper-Connected Space) [19ZH1100]

Ask authors/readers for more resources

Compared to the traditional machine learning models, deep neural networks (DNN) are known to be highly sensitive to the choice of hyperparameters. While the required time and effort for manual tuning has been rapidly decreasing for the well developed and commonly used DNN architectures, undoubtedly DNN hyperparameter optimization will continue to be a major burden whenever a new DNN architecture needs to be designed, a new task needs to be solved, a new dataset needs to be addressed, or an existing DNN needs to be improved further. For hyperparameter optimization of general machine learning problems, numerous automated solutions have been developed where some of the most popular solutions are based on Bayesian Optimization (BO). In this work, we analyze four fundamental strategies for enhancing BO when it is used for DNN hyperparameter optimization. Specifically, diversification, early termination, parallelization, and cost function transformation are investigated. Based on the analysis, we provide a simple yet robust algorithm for DNN hyperparameter optimization - DEEP-BO (Diversified, Early-termination-Enabled, and Parallel Bayesian Optimization). When evaluated over six DNN benchmarks, DEEP-BO mostly outperformed well-known solutions including GP-Hedge, BOHB, and the speed-up variants that use Median Stopping Rule or Learning Curve Extrapolation. In fact, DEEP-BO consistently provided the top, or at least close to the top, performance over all the benchmark types that we have tested. This indicates that DEEP-BO is a robust solution compared to the existing solutions. The DEEP-BO code is publicly available at https://github.com/snu-adsl/DEEP-BO.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available