Journal
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES
Volume 8, Issue 3, Pages 2332-2344Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIV.2022.3233592
Keywords
Safety; Predictive models; Trajectory tracking; Training; Reinforcement learning; Heuristic algorithms; Uncertainty; Model-reference control; autonomous vehicle; safe reinforcement learning; model-based reinforcement learning; Gaussian process; control barrier function
Ask authors/readers for more resources
In this paper, a novel safe model-based RL algorithm is proposed to solve the collision-free model-reference trajectory tracking problem of uncertain autonomous vehicles (AVs). A new type of robust control barrier function (CBF) condition for collision-avoidance is derived by incorporating the estimation of the system uncertainty with Gaussian process (GP) regression. A robust CBF-based RL control structure is proposed, and within this structure, a Dyna-style safe model-based RL algorithm is developed to achieve safe exploration and improve sample efficiency.
Applying reinforcement learning (RL) algorithms to control systems design remains a challenging task due to the potential unsafe exploration and the low sample efficiency. In this paper, we propose a novel safe model-based RL algorithm to solve the collision-free model-reference trajectory tracking problem of uncertain autonomous vehicles (AVs). Firstly, a new type of robust control barrier function (CBF) condition for collision-avoidance is derived for the uncertain AVs by incorporating the estimation of the system uncertainty with Gaussian process (GP) regression. Then, a robust CBF-based RL control structure is proposed, where the nominal control input is composed of the RL policy and a model-based reference control policy. The actual control input obtained from the quadratic programming problem can satisfy the constraints of collision-avoidance, input saturation and velocity boundedness simultaneously with a relatively high probability. Finally, within this control structure, a Dyna-style safe model-based RL algorithm is proposed, where the safe exploration is achieved through executing the robust CBF-based actions and the sample efficiency is improved by leveraging the GP models. The superior learning performance of the proposed RL control structure is demonstrated through simulation experiments.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available