King's College London

Research portal

Deep Reinforcement Learning for Discrete and Continuous Massive Access Control optimization

Research output: Chapter in Book/Report/Conference proceedingConference paper

Original languageEnglish
Title of host publication2020 IEEE International Conference on Communications, ICC 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728150895
DOIs
Publication statusPublished - Jun 2020
Event2020 IEEE International Conference on Communications, ICC 2020 - Dublin, Ireland
Duration: 7 Jun 202011 Jun 2020

Publication series

NameIEEE International Conference on Communications
Volume2020-June
ISSN (Print)1550-3607

Conference

Conference2020 IEEE International Conference on Communications, ICC 2020
CountryIreland
CityDublin
Period7/06/202011/06/2020

King's Authors

Abstract

Cellular-based networks are expected to offer connectivity for massive Internet of Things (mIoT) systems, however, their Random Access CHannel (RACH) procedure suffers from unreliability, due to the collision during the simultaneous massive. Despite that this collision problem has been treated in existing RACH schemes by organizing IoT devices' transmission and retransmission via the central control at the Base Station (BS), these existing RACH schemes are usually fixed over time, thus can hardly adapt to time-varying traffic patterns. In order to optimize the long-term objective in the number of success devices, this paper aims to design Deep Reinforcement Learning (DRL)-based optimizers with Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG) for optimizing RACH schemes, including Access Class Barring (ACB), Back-Off (BO), and Distributed Queuing (DQ). Specifically, we apply DQN to handle discrete action selection for the BO as well as the DQ schemes, and DDPG to handle continuous action selection for the ACB scheme. Both agents are integrated with Gated recurrent unit Gated Recurrent Unit (GRU) network to approximate their value function/policy, which can improve the optimization performance by capturing temporal traffic correlations. Numerical results showcase that our proposed DRL-based optimizers considerably outperform conventional heuristic solutions in terms of the number of success access devices.

View graph of relations

© 2018 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454