☆ 4.7 Article

Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization

NEURAL NETWORKS (2023)

期刊

NEURAL NETWORKS

卷 161, 期 -, 页码 228-241

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.neunet.2023.01.025

关键词

Efficient interpretability; Interpretable reinforcement learning; Saliency map

类别

Computer Science, Artificial Intelligence Neurosciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Although deep Reinforcement Learning (RL) has been successful in various tasks, interpretability remains a challenge in real-world applications. Existing saliency map approaches in the RL domain either lack real-time capability or fail to produce interpretable saliency maps. This work presents an approach, called Distillation with selective Input Gradient Regularization (DIGR), that combines policy distillation and input gradient regularization to generate saliency maps with high interpretability and computation efficiency. Experimental results on MiniGrid (Fetch Object), Atari (Breakout), and CARLA Autonomous Driving tasks demonstrate the importance and effectiveness of the proposed approach.

Although deep Reinforcement Learning (RL) has proven successful in a wide range of tasks, one challenge it faces is interpretability when applied to real-world problems. Saliency maps are frequently used to provide interpretability for deep neural networks. However, in the RL domain, existing saliency map approaches are either computationally expensive and thus cannot satisfy the real-time requirement of real-world scenarios or cannot produce interpretable saliency maps for RL policies. In this work, we propose an approach of Distillation with selective Input Gradient Regularization (DIGR) which uses policy distillation and input gradient regularization to produce new policies that achieve both high interpretability and computation efficiency in generating saliency maps. Our approach is also found to improve the robustness of RL policies to multiple adversarial attacks. We conduct experiments on three tasks, MiniGrid (Fetch Object), Atari (Breakout) and CARLA Autonomous Driving, to demonstrate the importance and effectiveness of our approach.(c) 2023 Published by Elsevier Ltd.

Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization

期刊

NEURAL NETWORKS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization

期刊

NEURAL NETWORKS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文