☆ 4.7 Article

Risk-averse policy optimization via risk-neutral policy optimization

ARTIFICIAL INTELLIGENCE (2022)

Journal

ARTIFICIAL INTELLIGENCE

Volume 311, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.artint.2022.103765

Keywords

Reinforcement learning; Risk-aversion; Risk-sensitivity

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper introduces a unified framework to optimize various risk measures, including conditional value-at-risk, utility functions, and mean-variance. By leveraging recent theoretical results on state augmentation, the decision-making process is transformed to optimize in the transformed environment, and a simple risk-sensitive meta-algorithm is presented to optimize risk-neutral policy. Extensive experiments demonstrate the advantages of this approach over existing ad-hoc methodologies in different domains.

Keeping risk under control is a primary objective in many critical real-world domains, including finance and healthcare. The literature on risk-averse reinforcement learning (RL) has mostly focused on designing ad-hoc algorithms for specific risk measures. As such, most of these algorithms do not easily generalize to measures other than the one they are designed for. Furthermore, it is often unclear whether state-of-the-art risk -neutral RL algorithms can be extended to reduce risk. In this paper, we take a step towards overcoming these limitations, proposing a single framework to optimize some of the most popular risk measures, including conditional value-at-risk, utility functions, and mean-variance. Leveraging recent theoretical results on state augmentation, we transform the decision-making process so that optimizing the chosen risk measure in the original environment is equivalent to optimizing the expected cost in the transformed one. We then present a simple risk-sensitive meta-algorithm that transforms the trajectories it collects from the environment and feeds these into any risk-neutral policy optimization method. Finally, we provide extensive experiments that show the benefits of our approach over existing ad-hoc methodologies in different domains, including the Mujoco robotic suite and a real-world trading dataset. (C) 2022 Elsevier B.V. All rights reserved.

Risk-averse policy optimization via risk-neutral policy optimization

Journal

ARTIFICIAL INTELLIGENCE

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Risk-averse policy optimization via risk-neutral policy optimization

Journal

ARTIFICIAL INTELLIGENCE

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper