4.6 Article

A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment

期刊

IEEE TRANSACTIONS ON CYBERNETICS
卷 47, 期 1, 页码 186-197

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCYB.2015.2509646

关键词

Flocking; Q-learning; reinforcement learning (RL); unmanned aerial vehicles (UAVs)

向作者/读者索取更多资源

In the past two decades, unmanned aerial vehicles (UAVs) have demonstrated their efficacy in supporting both military and civilian applications, where tasks can be dull, dirty, dangerous, or simply too costly with conventional methods. Many of the applications contain tasks that can be executed in parallel, hence the natural progression is to deploy multiple UAVs working together as a force multiplier. However, to do so requires autonomous coordination among the UAVs, similar to swarming behaviors seen in animals and insects. This paper looks at flocking with small fixed-wing UAVs in the context of a model-free reinforcement learning problem. In particular, Peng's Q(lambda) with a variable learning rate is employed by the followers to learn a control policy that facilitates flocking in a leader-follower topology. The problem is structured as a Markov decision process, where the agents are modeled as small fixed-wing UAVs that experience stochasticity due to disturbances such as winds and control noises, as well as weight and balance issues. Learned policies are compared to ones solved using stochastic optimal control (i.e., dynamic programming) by evaluating the average cost incurred during flight according to a cost function. Simulation results demonstrate the feasibility of the proposed learning approach at enabling agents to learn how to flock in a leader-follower topology, while operating in a nonstationary stochastic environment.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据