Non-episodic and Heterogeneous Environment in Distributed Multi-agent Reinforcement Learning

Research output: Contribution to conference typesPaperpeer-review

2 Citations (Scopus)

Abstract

Reinforcement learning (RL) is a efficient intelligent algorithm when solving radio resource management problems in the wireless communication network. However, for large-scale networks with limited centralization (i.e., high latency connection to center-server or capacity-limited backbone), it is not realistic to employ a centralized RL algorithm to perform joint real-time decision-making for the entire network, which calls for scalable algorithm designs. Multi-agent RL, which allows separate local execution of policy, has been applied to large-scale wireless communication areas. However, it has performance issue which largely varies with different system settings. In this paper, we study a multi-agent algorithm for a coordinate multipoint (CoMP) scenario, which requires cooperation between base stations. We show that the common settings of user distribution, the design of reward, and episodic in the environment can significantly ease the learning of the algorithm and obtain beautiful converge results. However, these settings are not realistic in wireless communication. By validating the performance difference between these settings with our algorithm in a coordinate multipoint (CoMP) scenario, we introduce several possible solutions and highlight the necessity of further study in this area.

Original languageEnglish
Pages1019-1024
Number of pages6
DOIs
Publication statusPublished - 2022
Event2022 IEEE Global Communications Conference, GLOBECOM 2022 - Virtual, Online, Brazil
Duration: 4 Dec 20228 Dec 2022

Conference

Conference2022 IEEE Global Communications Conference, GLOBECOM 2022
Country/TerritoryBrazil
CityVirtual, Online
Period4/12/20228/12/2022

Fingerprint

Dive into the research topics of 'Non-episodic and Heterogeneous Environment in Distributed Multi-agent Reinforcement Learning'. Together they form a unique fingerprint.

Cite this