4.8 Article

A Novel Nonlinear Deep Reinforcement Learning Controller for DC-DC Power Buck Converters

Journal

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
Volume 68, Issue 8, Pages 6849-6858

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIE.2020.3005071

Keywords

Buck converters; Observers; Fuel cells; Mathematical model; Voltage control; Capacitors; Reinforcement learning; Buck converter; constant power load (CPL); deep deterministic policy gradient (DDPG); sliding mode (SM) observer; ultralocal model (ULM)

Ask authors/readers for more resources

This article discusses an intelligent proportional-integral based on sliding mode observer to mitigate impedance instabilities of nonideal CPLs, as well as a deep deterministic policy gradient (DDPG) controller to decrease observer estimation error and enhance the dynamic characteristics of dc-dc buck converters.
The nonlinearities and unmodeled dynamics inevitably degrade the quality and reliability of power conversion, and as a result, pose big challenges on higher-performance voltage stabilization of dc-dc buck converters. The stability of such power electronic equipment is further threatened when feeding the nonideal constant power loads (CPLs) because of the induced negative impedance specifications. In response to these challenges, the advanced regulatory and technological mechanisms associated with the converters require to be developed to efficiently implement these interface systems in the microgrid configuration. This article addresses an intelligent proportional-integral based on sliding mode (SM) observer to mitigate the destructive impedance instabilities of nonideal CPLs with time-varying nature in the ultralocal model sense. In particular, in the current article, an auxiliary deep deterministic policy gradient (DDPG) controller is adaptively developed to decrease the observer estimation error and further ameliorate the dynamic characteristics of dc-dc buck converters. The design of the DDPG is realized in two parts: (i) an actor-network which generates the policy commands, while (ii) a critic-network evaluates the quality of the policy command generated by the actor. The suggested strategy establishes the DDPG-based control to handle for what the iPI-based SM observer is unable to compensate. In this application, the weight coefficients of the actor and critic networks are trained based on the reward feedback of the voltage error, by using the gradient descent scheme. Finally, to investigate the merits and implementation feasibility of the suggested method, some experimental results on a laboratory prototype of the dc-dc buck converter, which feeds a time-varying CPL, are presented.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available