4.5 Article

Large-Scale Wildfire Mitigation Through Deep Reinforcement Learning

Journal

FRONTIERS IN FORESTS AND GLOBAL CHANGE
Volume 5, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA
DOI: 10.3389/ffgc.2022.734330

Keywords

forest management; wildfire mitigation; Markov Decision Process; dynamic programming; deep reinforcement learning

Ask authors/readers for more resources

This paper proposes a Deep Reinforcement Learning (DRL) approach for forest management, aiming to track the dynamics of large-scale forests and prevent/mitigate wildfire risks. By designing a spatial Markov Decision Process (MDP) model and using off-policy actor-critic with experience replay, the approach shows good scalability for providing forest management plans. It outperforms a genetic algorithm (GA) and achieves identical results as the exact MDP solution for low dimensional models.
Forest management can be seen as a sequential decision-making problem to determine an optimal scheduling policy, e.g., harvest, thinning, or do-nothing, that can mitigate the risks of wildfire. Markov Decision Processes (MDPs) offer an efficient mathematical framework for optimizing forest management policies. However, computing optimal MDP solutions is computationally challenging for large-scale forests due to the curse of dimensionality, as the total number of forest states grows exponentially with the numbers of stands into which it is discretized. In this work, we propose a Deep Reinforcement Learning (DRL) approach to improve forest management plans that track the forest dynamics in a large area. The approach emphasizes on prevention and mitigation of wildfire risks by determining highly efficient management policies. A large-scale forest model is designed using a spatial MDP that divides the square-matrix forest into equal stands. The model considers the probability of wildfire dependent on the forest timber volume, the flammability, and the directional distribution of the wind using data that reflects the inventory of a typical eucalypt (Eucalyptus globulus Labill) plantation in Portugal. In this spatial MDP, the agent (decision-maker) takes an action at one stand at each step. We use an off-policy actor-critic with experience replay reinforcement learning approach to approximate the MDP optimal policy. In three different case studies, the approach shows good scalability for providing large-scale forest management plans. The results of the expected return value and the computed DRL policy are found identical to the exact optimum MDP solution, when this exact solution is available, i.e., for low dimensional models. DRL is also found to outperform a genetic algorithm (GA) solutions which were used as benchmarks for large-scale model policy.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available