期刊
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
卷 111, 期 -, 页码 -出版社
PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.engappai.2022.104765
关键词
Reinforcement learning; Safe exploration; Fuzzy Q-learning; Safe reinforcement learning
类别
资金
- King's College London, United Kingdom
This paper presents a method to improve the safety of agents during the exploration stage in q-learning. By introducing a safety indicator function and a safe exploration mask, the algorithm reduces the likelihood of unsafe actions and improves its applicability in industrial settings.
Most reinforcement learning algorithms focus on discovering the optimal policy to maximize reward while neglecting the safety issue during the exploration stage, which is not acceptable in industrial applications. This paper concerns the efficient method to improve the safety of the agent during the exploration stage in q-learning without any prior knowledge. We propose a novel approach named safe exploration mask to reduce the number of safety violations in q-learning by modifying the transition possibility of the environment. To this end, a safety indicator function consisting of distance metric and controllability metric is designed. The safety indicator function can be learned by the agent through bootstrapping without additional optimization solver. We prove that the safety indicator function will converge in tabular q-learning and introduce two tricks to mitigate the divergence in approximation-based q-learning. Based on the safety indicator function, the safe exploration mask is generated to modify the original exploration policy by reducing the transition possibility of unsafe actions. Finally, the simulations in both discrete and continuous environments demonstrate the advantages, feasibility, and safety of our method in both discrete and continuous q-learning algorithms.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据