Abstract
Terahertz (THz) non-orthogonal multiple access (NOMA) networks have great potential for next-generation wireless communications, by providing promising ultra-high data rates and user fairness. In THz-NOMA networks, efficient and effective long-term beamforming-bandwidth-power (BBP) allocation is yet an open problem due to its non-deterministic polynomial-time hard (NP-hard) nature. In this article, the continuous property of power and sub-arrays ratios assignment and the discrete property of sub-bands allocation are carefully treated. In light of these attributes, an offline hybrid discrete and continuous actions (DISCO) multitask deep reinforcement learning (DRL) algorithm is proposed to maximize the long-term throughput. Specifically, the deployment of multi-task learning enables the actor of DISCO to smartly integrate two state-of-the-art DRL algorithms, e.g., actor-critic (AC) that only selects discrete actions and deep deterministic policy gradient (DDPG) that only generates continuous actions. Rigorous theoretical derivations for the neural network design and backpropagation process are provided to tailor our proposed DISCO for the BBP problem. Compared to the benchmark no-learning and conventional DRL algorithms, DISCO enhances the network throughput, while achieving good fairness among users. Furthermore, DISCO consumes hundred-of-millisecond computational time, revealing the practicability of DISCO.
Original language | English |
---|---|
Pages (from-to) | 11647-11663 |
Number of pages | 17 |
Journal | IEEE Transactions on Vehicular Technology |
Volume | 73 |
Issue number | 8 |
DOIs | |
Publication status | Published - 2024 |
Keywords
- Deep Reinforcement Learning (DRL)
- Hybrid power systems
- Multitasking
- NOMA
- Non-Orthogonal Multiple Access (NOMA)
- Resource management
- Terahertz (THz) networks
- Terahertz communications
- Throughput
- Wireless communication