4.7 Article

Model-free reinforcement learning with model-based safe exploration: Optimizing adaptive recovery process of infrastructure systems

Journal

STRUCTURAL SAFETY
Volume 80, Issue -, Pages 46-55

Publisher

ELSEVIER
DOI: 10.1016/j.strusafe.2019.04.003

Keywords

Reinforcement learning; Extreme events; Resilience; Infrastructure systems

Funding

  1. National Science Foundation [1663479]
  2. Directorate For Engineering
  3. Div Of Civil, Mechanical, & Manufact Inn [1663479] Funding Source: National Science Foundation

Ask authors/readers for more resources

Extreme events represent not only some of the most damaging events in our society and environment, but also the most difficult to predict. Model-based predictions of the disruptions induced by extreme events on urban infrastructure systems are often unreliable, as these events are unlikely by their very definition. Specifically, characterizing the effect of such disruptions to the urban infrastructure using a parameterized model is a difficult task. On the other hand, model-free approaches based on recent advancements in reinforcement learning can model the complex dynamics of urban society and infrastructure under the risk of extreme events explicitly without relying on any specific physics-based mechanism. However, these approaches usually require performing random exploration of the effects of management actions on the system (typically in the post-event situation) to allow for an acceptable approximation to the optimal management policy. When dealing with costly infrastructure systems and important communities, this random exploration can be unacceptable and risky. In this paper, we propose a method called Safe Q-leaming, which is a model-free reinforcement learning approach with addition of a model-based safe exploration for near-optimal management of infrastructure system pre-event and their recovery post-event. Our method requires the decision-maker to model the structure of the state space of the problem, and a suitable equilibrium of the system (optimum functionality pre-event). This information is usually available for urban systems, as they spend long time in optimum equilibrium before the occurrence of such events. We show on several examples of infrastructure management how the proposed approach is able to achieve near-optimal performance without the risk due to random exploration.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available