3.8 Proceedings Paper

Training Agents to Satisfy Timed and Untimed Signal Temporal Logic Specifications with Reinforcement Learning

期刊

出版社

SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-031-17108-6_12

关键词

Deep Reinforcement Learning; Safe Reinforcement Learning; Signal Temporal Logic; Curriculum Learning

资金

  1. Defense Advanced Research Projects Agency (DARPA) [FA8750-18-C-0089]
  2. Air Force Office of Scientific Research (AFOSR) [FA9550-22-1-0019]
  3. National Science Foundation (NSF) [2028001]
  4. Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program
  5. Directorate For Engineering
  6. Div Of Electrical, Commun & Cyber Sys [2028001] Funding Source: National Science Foundation

向作者/读者索取更多资源

Reinforcement Learning relies on designing reward functions to capture intended behavior. Traditional approaches struggle to represent temporal behavior and require manual effort to create reward functions. To address this, we propose an automatic method to convert specifications into reward functions and demonstrate its effectiveness in training RL agents.
Reinforcement Learning (RL) depends critically on how reward functions are designed to capture intended behavior. However, traditional approaches are unable to represent temporal behavior, such as do task 1 before doing task 2. In the event they can represent temporal behavior, these reward functions are handcrafted by researchers and often require long hours of trial and error to shape the reward function just right to get the desired behavior. In these cases, the desired behavior is already known, the problem is generating a reward function to train the RL agent to satisfy that behavior. To address this issue, we present our approach for automatically converting timed and untimed specifications into a reward function, which has been implemented as the tool STLGym. In this work, we show how STLGym can be used to train RL agents to satisfy specifications better than traditional approaches and to refine learned behavior to better match the specification.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据