TY - CONF
T1 - Non-episodic and Heterogeneous Environment in Distributed Multi-agent Reinforcement Learning
AU - Hu, Fenghe
AU - Deng, Yansha
AU - Aghvami, A. Hamid
N1 - Funding Information:
F. Hu, Y. Deng, and A. H. Aghvami are with King’s College London, UK (E-mail:fenghe.hu, [email protected]). This work was supported by Engineering and Physical Sciences Research Council (EPSRC), UK., under Grant EP/W004348/1.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Reinforcement learning (RL) is a efficient intelligent algorithm when solving radio resource management problems in the wireless communication network. However, for large-scale networks with limited centralization (i.e., high latency connection to center-server or capacity-limited backbone), it is not realistic to employ a centralized RL algorithm to perform joint real-time decision-making for the entire network, which calls for scalable algorithm designs. Multi-agent RL, which allows separate local execution of policy, has been applied to large-scale wireless communication areas. However, it has performance issue which largely varies with different system settings. In this paper, we study a multi-agent algorithm for a coordinate multipoint (CoMP) scenario, which requires cooperation between base stations. We show that the common settings of user distribution, the design of reward, and episodic in the environment can significantly ease the learning of the algorithm and obtain beautiful converge results. However, these settings are not realistic in wireless communication. By validating the performance difference between these settings with our algorithm in a coordinate multipoint (CoMP) scenario, we introduce several possible solutions and highlight the necessity of further study in this area.
AB - Reinforcement learning (RL) is a efficient intelligent algorithm when solving radio resource management problems in the wireless communication network. However, for large-scale networks with limited centralization (i.e., high latency connection to center-server or capacity-limited backbone), it is not realistic to employ a centralized RL algorithm to perform joint real-time decision-making for the entire network, which calls for scalable algorithm designs. Multi-agent RL, which allows separate local execution of policy, has been applied to large-scale wireless communication areas. However, it has performance issue which largely varies with different system settings. In this paper, we study a multi-agent algorithm for a coordinate multipoint (CoMP) scenario, which requires cooperation between base stations. We show that the common settings of user distribution, the design of reward, and episodic in the environment can significantly ease the learning of the algorithm and obtain beautiful converge results. However, these settings are not realistic in wireless communication. By validating the performance difference between these settings with our algorithm in a coordinate multipoint (CoMP) scenario, we introduce several possible solutions and highlight the necessity of further study in this area.
UR - http://www.scopus.com/inward/record.url?scp=85146960157&partnerID=8YFLogxK
U2 - 10.1109/GLOBECOM48099.2022.10000672
DO - 10.1109/GLOBECOM48099.2022.10000672
M3 - Paper
AN - SCOPUS:85146960157
SP - 1019
EP - 1024
T2 - 2022 IEEE Global Communications Conference, GLOBECOM 2022
Y2 - 4 December 2022 through 8 December 2022
ER -