☆ 4.7 Article

Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING (2023)

期刊

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING

卷 17, 期 1, 页码 40-53

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/JSTSP.2022.3221671

关键词

Heuristic algorithms; Resource management; Quality of service; Time-frequency analysis; Interference; Fading channels; Dynamic scheduling; Dynamic TFDD; decentralized partially observable Markov decision process; federated learning; multi-agent reinforcement learning; resource allocation

类别

Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

To address the challenges posed by dynamic and heterogeneous data traffic in 5G and beyond mobile networks, we proposed a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme. By formulating the problem as a decentralized partially observable Markov decision process (Dec-POMDP), we optimized the uplink and downlink time-frequency resource allocation of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while reducing inter-cell interference. The proposed federated reinforcement learning (RL) algorithm, FWDDPG, enables decentralized global optimization of resource allocation through the exchange of local RL models among neighboring BSs within a federated learning framework. Simulation results demonstrate the superiority of our algorithm in terms of system sum rate compared to benchmark algorithms.

The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to the benchmark algorithms with respect to system sum rate.

Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control

期刊

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control

期刊

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文