Journal
AUTOMATICA
Volume 148, Issue -, Pages -Publisher
PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.automatica.2022.110761
Keywords
Reinforcement learning; Linear quadratic tracking control; Discounted cost function; Singular perturbation theory
Ask authors/readers for more resources
This paper addresses the issue of linear quadratic tracking control (LQTC) with a discounted cost function for unknown systems. Existing design methods often require a small discount factor for closed-loop stability, but solving the discounted algebraic Riccati equation may lead to ill-conditioned numerical problems with a small discount factor. By using singular perturbation theory, the full-order discounted Riccati equation is decomposed into a reduced-order Riccati equation and a Sylvester equation, allowing for the design of feedback and feedforward control gains. The resulting controller is proven to be stabilizing and near-optimal in solving the original LQTC problem. In the framework of reinforcement learning, on-policy and off-policy two-phase learning algorithms are derived for designing a near-optimal tracking control policy without prior knowledge of the discount factor. Comparative simulation results are provided to demonstrate the advantages of the proposed approach.
This paper considers the problem of linear quadratic tracking control (LQTC) with a discounted cost function for unknown systems. The existing design methods often require the discount factor to be small enough to guarantee the closed-loop stability. However, solving the discounted algebraic Riccati equation (ARE) may lead to ill-conditioned numerical issues if the discount factor is too small. By singular perturbation theory, we decompose the full-order discounted ARE into a reduced-order ARE and a Sylvester equation, which facilitate designing the feedback and feedforward control gains. The obtained controller is proved to be a stabilizing and near-optimal solution to the original LQTC problem. In the framework of reinforcement learning, both on-policy and off-policy two-phase learning algorithms are derived to design the near-optimal tracking control policy without knowing the discount factor. The advantages of the developed results are illustrated by comparative simulation results. (c) 2022 Published by Elsevier Ltd.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available