4.7 Article

Risk-averse policy optimization via risk-neutral policy optimization

Journal

ARTIFICIAL INTELLIGENCE
Volume 311, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.artint.2022.103765

Keywords

Reinforcement learning; Risk-aversion; Risk-sensitivity

Ask authors/readers for more resources

This paper introduces a unified framework to optimize various risk measures, including conditional value-at-risk, utility functions, and mean-variance. By leveraging recent theoretical results on state augmentation, the decision-making process is transformed to optimize in the transformed environment, and a simple risk-sensitive meta-algorithm is presented to optimize risk-neutral policy. Extensive experiments demonstrate the advantages of this approach over existing ad-hoc methodologies in different domains.
Keeping risk under control is a primary objective in many critical real-world domains, including finance and healthcare. The literature on risk-averse reinforcement learning (RL) has mostly focused on designing ad-hoc algorithms for specific risk measures. As such, most of these algorithms do not easily generalize to measures other than the one they are designed for. Furthermore, it is often unclear whether state-of-the-art risk -neutral RL algorithms can be extended to reduce risk. In this paper, we take a step towards overcoming these limitations, proposing a single framework to optimize some of the most popular risk measures, including conditional value-at-risk, utility functions, and mean-variance. Leveraging recent theoretical results on state augmentation, we transform the decision-making process so that optimizing the chosen risk measure in the original environment is equivalent to optimizing the expected cost in the transformed one. We then present a simple risk-sensitive meta-algorithm that transforms the trajectories it collects from the environment and feeds these into any risk-neutral policy optimization method. Finally, we provide extensive experiments that show the benefits of our approach over existing ad-hoc methodologies in different domains, including the Mujoco robotic suite and a real-world trading dataset. (C) 2022 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available