4.7 Article

Reinforcement Learning-Based NOMA Power Allocation in the Presence of Smart Jamming

期刊

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY
卷 67, 期 4, 页码 3377-3389

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TVT.2017.2782726

关键词

Nonorthogonal multiple access (NOMA); smart jamming; power allocation; game theory; reinforcement learning

资金

  1. National Natural Science Foundation of China [61671396, 91638204]
  2. U.S. National Science Foundation [CCF-1420575, CNS-1456793, ECCS-1307949, EARS-1444009]
  3. open research fund of National Mobile Communications Research Laboratory, Southeast University [2018D08]
  4. Division of Computing and Communication Foundations [1420575] Funding Source: National Science Foundation

向作者/读者索取更多资源

Nonorthogonal multiple access (NOMA) systems are vulnerable to jamming attacks, especially smart jammers who apply programmable and smart radio devices such as software-defined radios to flexibly control their jamming strategy according to the ongoing NOMA transmission and radio environment. In this paper, the power allocation of a base station in a NOMA system equipped with multiple antennas contending with a smart jammer is formulated as a zero-sum game, in which the base station as the leader first chooses the transmit power on multiple antennas, while a jammer as the follower selects the jamming power to interrupt the transmission of the users. A Stackelberg equilibrium of the antijamming NOMA transmission game is derived and conditions assuring its existence are provided to disclose the impact of multiple antennas and radio channel states. A reinforcement learning-based power control scheme is proposed for the downlink NOMA transmission without being aware of the jamming and radio channel parameters. The Dyna architecture that formulates a learned world model from the real antijamming transmission experience and the hotbooting technique that exploits experiences in similar scenarios to initialize the quality values are used to accelerate the learning speed of the Q-learning-based power allocation, and thus, improve the communication efficiency of the NOMA transmission in the presence of smart jammers. Simulation results show that the proposed scheme can significantly increase the sum data rates of users, and thus, the utilities compared with the standard Q-learning-based strategy.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据