4.6 Article

Safe Reinforcement Learning for Model-Reference Trajectory Tracking of Uncertain Autonomous Vehicles With Model-Based Acceleration

Journal

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES
Volume 8, Issue 3, Pages 2332-2344

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIV.2022.3233592

Keywords

Safety; Predictive models; Trajectory tracking; Training; Reinforcement learning; Heuristic algorithms; Uncertainty; Model-reference control; autonomous vehicle; safe reinforcement learning; model-based reinforcement learning; Gaussian process; control barrier function

Ask authors/readers for more resources

In this paper, a novel safe model-based RL algorithm is proposed to solve the collision-free model-reference trajectory tracking problem of uncertain autonomous vehicles (AVs). A new type of robust control barrier function (CBF) condition for collision-avoidance is derived by incorporating the estimation of the system uncertainty with Gaussian process (GP) regression. A robust CBF-based RL control structure is proposed, and within this structure, a Dyna-style safe model-based RL algorithm is developed to achieve safe exploration and improve sample efficiency.
Applying reinforcement learning (RL) algorithms to control systems design remains a challenging task due to the potential unsafe exploration and the low sample efficiency. In this paper, we propose a novel safe model-based RL algorithm to solve the collision-free model-reference trajectory tracking problem of uncertain autonomous vehicles (AVs). Firstly, a new type of robust control barrier function (CBF) condition for collision-avoidance is derived for the uncertain AVs by incorporating the estimation of the system uncertainty with Gaussian process (GP) regression. Then, a robust CBF-based RL control structure is proposed, where the nominal control input is composed of the RL policy and a model-based reference control policy. The actual control input obtained from the quadratic programming problem can satisfy the constraints of collision-avoidance, input saturation and velocity boundedness simultaneously with a relatively high probability. Finally, within this control structure, a Dyna-style safe model-based RL algorithm is proposed, where the safe exploration is achieved through executing the robust CBF-based actions and the sample efficiency is improved by leveraging the GP models. The superior learning performance of the proposed RL control structure is demonstrated through simulation experiments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available