☆ 4.7 Article

SEM: Safe exploration mask for q-learning

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2022)

期刊

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

卷 111, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.engappai.2022.104765

关键词

Reinforcement learning; Safe exploration; Fuzzy Q-learning; Safe reinforcement learning

类别

Automation & Control Systems Computer Science, Artificial Intelligence Engineering, Multidisciplinary Engineering, Electrical & Electronic

资金

King's College London, United Kingdom

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper presents a method to improve the safety of agents during the exploration stage in q-learning. By introducing a safety indicator function and a safe exploration mask, the algorithm reduces the likelihood of unsafe actions and improves its applicability in industrial settings.

Most reinforcement learning algorithms focus on discovering the optimal policy to maximize reward while neglecting the safety issue during the exploration stage, which is not acceptable in industrial applications. This paper concerns the efficient method to improve the safety of the agent during the exploration stage in q-learning without any prior knowledge. We propose a novel approach named safe exploration mask to reduce the number of safety violations in q-learning by modifying the transition possibility of the environment. To this end, a safety indicator function consisting of distance metric and controllability metric is designed. The safety indicator function can be learned by the agent through bootstrapping without additional optimization solver. We prove that the safety indicator function will converge in tabular q-learning and introduce two tricks to mitigate the divergence in approximation-based q-learning. Based on the safety indicator function, the safe exploration mask is generated to modify the original exploration policy by reducing the transition possibility of unsafe actions. Finally, the simulations in both discrete and continuous environments demonstrate the advantages, feasibility, and safety of our method in both discrete and continuous q-learning algorithms.

SEM: Safe exploration mask for q-learning

期刊

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

SEM: Safe exploration mask for q-learning

期刊

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文