4.8 Article

Cognitive Optimal-Setting Control of AIoT Industrial Applications With Deep Reinforcement Learning

Journal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
Volume 17, Issue 3, Pages 2116-2123

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TII.2020.2986501

Keywords

Approximation algorithms; Optimization; Machine learning; Informatics; Cognitive systems; Acceleration; Cognitive learning; deep reinforcement learning; expectation-based method; overfitting

Funding

  1. Ministry of Science and Technology of the Republic of China [MOST 108-2511-H-143 -001]

Ask authors/readers for more resources

This article proposes a new expected advantage learning method to moderate the maximum value of expectation-based deep reinforcement learning for industrial applications. By replacing the sigmoid function with the tanh function as the softmax activation value, the proposed method successfully reduces the issue of numerical overfitting in cognitive computing.
For industrial applications of the artificial intelligence of things, mechanical control usually affects the overall product output and production schedule. Recently, more and more engineers have applied the deep reinforcement learning method to mechanical control to improve the company's profit. However, the problem of deep reinforcement learning training stage is that overfitting often occurs, which results in accidental control and increases the risk of overcontrol. In order to address this problem, in this article, an expected advantage learning method is proposed for moderating the maximum value of expectation-based deep reinforcement learning for industrial applications. With the tanh softmax policy of the softmax function, we replace the sigmod function with the tanh function as the softmax function activation value. It makes it so that the proposed expectation-based method can successfully decrease the value overfitting in cognitive computing. In the experimental results, the performance of the Deep Q Network algorithm, advantage learning algorithm, and propose expected advantage learning method were evaluated in every episodes with the four criteria: the total score, total step, average score, and highest score. Comparing with the AL algorithm, the total score of the proposed expected advantage learning method is increased by 6% in the same number of trainings. This shows that the action probability distribution of the proposed expected advantage learning method has better performance than the traditional soft-max strategy for the optimal setting control of industrial applications.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available