4.7 Article

LSTM: A Search Space Odyssey

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2016.2582924

关键词

Functional ANalysis Of VAriance (fANOVA); long short-term memory (LSTM); random search; recurrent neural networks; sequence learning

资金

  1. Swiss National Science Foundation through the Project Theory and Practice of Reinforcement Learning 2 [138219]
  2. Swiss National Science Foundation through the Project Advanced Reinforcement Learning [156682]
  3. European Institute of Innovation and Technology through the Project Neural Dynamics [FP7-ICT-270247]
  4. European Institute of Innovation and Technology through the Project NASCENCE [FP7-ICT-317662]
  5. European Institute of Innovation and Technology through the Project WAY [FP7-ICT-288551]

向作者/读者索取更多资源

Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful functional ANalysis Of VAriance framework. In total, we summarize the results of 5400 experimental runs (approximate to 15 years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据