期刊
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY
卷 67, 期 4, 页码 3377-3389出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TVT.2017.2782726
关键词
Nonorthogonal multiple access (NOMA); smart jamming; power allocation; game theory; reinforcement learning
资金
- National Natural Science Foundation of China [61671396, 91638204]
- U.S. National Science Foundation [CCF-1420575, CNS-1456793, ECCS-1307949, EARS-1444009]
- open research fund of National Mobile Communications Research Laboratory, Southeast University [2018D08]
- Division of Computing and Communication Foundations [1420575] Funding Source: National Science Foundation
Nonorthogonal multiple access (NOMA) systems are vulnerable to jamming attacks, especially smart jammers who apply programmable and smart radio devices such as software-defined radios to flexibly control their jamming strategy according to the ongoing NOMA transmission and radio environment. In this paper, the power allocation of a base station in a NOMA system equipped with multiple antennas contending with a smart jammer is formulated as a zero-sum game, in which the base station as the leader first chooses the transmit power on multiple antennas, while a jammer as the follower selects the jamming power to interrupt the transmission of the users. A Stackelberg equilibrium of the antijamming NOMA transmission game is derived and conditions assuring its existence are provided to disclose the impact of multiple antennas and radio channel states. A reinforcement learning-based power control scheme is proposed for the downlink NOMA transmission without being aware of the jamming and radio channel parameters. The Dyna architecture that formulates a learned world model from the real antijamming transmission experience and the hotbooting technique that exploits experiences in similar scenarios to initialize the quality values are used to accelerate the learning speed of the Q-learning-based power allocation, and thus, improve the communication efficiency of the NOMA transmission in the presence of smart jammers. Simulation results show that the proposed scheme can significantly increase the sum data rates of users, and thus, the utilities compared with the standard Q-learning-based strategy.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据