☆ 4.7 Article

Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS (2019)

期刊

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

卷 37, 期 6, 页码 1424-1440

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/JSAC.2019.2904366

关键词

Narrowband Internet of Things; resource configuration; real-time optimization; reinforcement learning; cooperative learning

类别

Engineering, Electrical & Electronic Telecommunications

资金

Engineering and Physical Sciences Research Council (EPSRC) [EP/R006466/1, EP/R006377/1]
EPSRC [EP/R006466/1, EP/R006377/1] Funding Source: UKRI

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

NarrowBand Internet of Things (NB-IoT) is an emerging cellular-based technology that offers a range of flexible configurations for massive IoT radio access from groups of devices with heterogeneous requirements. A configuration specifies the amount of radio resource allocated to each group of devices for random access and for data transmission. Assuming no knowledge of the traffic statistics, there exists an important challenge in how to determine the configuration that maximizes the long-term average number of served IoT devices at each transmission time interval (TTI) in an online fashion. Given the complexity of searching for optimal configuration, we first develop real-time configuration selection based on the tabular Q-learning (tabular-Q), the linear approximation-based Q-learning (LA-Q), and the deep neural network-based Q-learning (DQN) in the single-parameter single-group scenario. Our results show that the proposed reinforcement learning-based approaches considerably outperform the conventional heuristic approaches based on load estimation (LE-URC) in terms of the number of served IoT devices. This result also indicates that LA-Q and DQN can he good alternatives for tabular-Q to achieve almost the same performance with much less training time. We further advance LA-Q and DQN via actions aggregation (AA-LA-Q and AA-DQN) and via cooperative multi-agent learning (CMA-DQN) for the multi-parameter multi-group scenario, thereby solve the problem that Q-learning agents do not converge in high-dimensional configurations. In this scenario, the superiority of the proposed Q-learning approaches over the conventional LE-URC approach significantly improves with the increase of configuration dimensions, and the CMA-DQN approach outperforms the other approaches in both throughput and training efficiency.

Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

期刊

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

期刊

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文