4.6 Article

Reinforcement Learning for Outdoor Balloon Navigation: A Successful Controller for an Autonomous Balloon

Journal

IEEE ROBOTICS & AUTOMATION MAGAZINE
Volume -, Issue -, Pages -

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/MRA.2023.3271203

Keywords

Wind; Navigation; Buoyancy; Aerospace electronics; Encoding; Atmospheric modeling; Training

Ask authors/readers for more resources

Autonomous ballooning presents challenges for planning and control algorithms due to limited control capabilities, the stochastic nature of balloon flight caused by wind, and the difficulty of sensing wind remotely. This study uses reinforcement learning to develop a control policy for autonomous balloon navigation in a varying wind field. The approach is evaluated through simulations and indoor and outdoor experiments, demonstrating successful navigation towards target positions with minimal distance errors.
Autonomous ballooning allows for energy-efficient long-range missions but introduces significant challenges for planning and control algorithms, due to their single degree of actuation: vertical rate control through either buoyancy or vertical thrust. Lateral motion is typically due to the wind; thus, balloon flight is both nonholonomic and often stochastic. Finally, wind is very challenging to sense remotely, and estimates are often available only via low-temporal-and-spatial-frequency predictions from large-scale weather models and direct in situ measurements. In this work, reinforcement learning (RL) is used to generate a control policy for an autonomous balloon navigating between 3D positions in a time- and spatially varying wind field. The agent uses its position and velocity, the relative position of the target, and an estimate of the surrounding wind field to command a target altitude. The wind information contains local measurements and an encoding of global wind predictions from a large-scale numerical weather prediction (NWP) model around the current balloon location. The RL algorithm used in this work, the soft actor-critic (SAC), is trained with a reward favoring paths that reach as close as possible to the target, with minimum time and actuation costs. We evaluate our approach first in simulation and then with a controlled indoor experiment, where we generate an artificial wind field and reach a median distance of 23.4 cm from the target within a volume of 3.5 x 3.5 x 3.5 m over 30 trials. Finally, using a fully autonomous custom designed outdoor prototype capable of controlling altitude, long-range communication, redundant localization, and onboard computation, we validate our approach in a real-world setting. Over six flights, the agent navigates to predefined target positions, with an average target distance error of 360 m after traveling approximately 10 km within a volume of 22 x 22 x 3.2 km.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available