TY - JOUR
T1 - Adaptive UAV-Trajectory Optimization under Quality of Service Constraints
T2 - A Model-Free Solution
AU - Cui, Jingjing
AU - DIng, Zhiguo
AU - Deng, Yansha
AU - Nallanathan, Arumugam
AU - Hanzo, Lajos
PY - 2020
Y1 - 2020
N2 - Unmanned aerial vehicles (UAVs) with the potential of providing reliable high-rate connectivity, are becoming a promising component of future wireless networks. A UAV collects data from a set of randomly distributed sensors, where both the locations of these sensors and their data volume to be transmitted are unknown to the UAV. In order to assist the UAV in finding the optimal motion trajectory in the face of the uncertainty without the above knowledge whilst aiming for maximizing the cumulative collected data, we formulate a reinforcement learning problem by modelling the motion-trajectory as a Markov decision process with the UAV acting as the learning agent. Then, we propose a pair of novel trajectory optimization algorithms based on stochastic modelling and reinforcement learning, which allows the UAV to optimize its flight trajectory without the need for system identification. More specifically, by dividing the considered region into small tiles, we conceive state-action-reward-state-action (Sarsa) and Q -learning based UAV-trajectory optimization algorithms (i.e., SUTOA and QUTOA) aiming to maximize the cumulative data collected during the finite flight-time. Our simulation results demonstrate that both of the proposed approaches are capable of finding an optimal trajectory under the flight-time constraint. The preference for QUTOA vs. SUTOA depends on the relative position of the start and the end points of the UAVs.
AB - Unmanned aerial vehicles (UAVs) with the potential of providing reliable high-rate connectivity, are becoming a promising component of future wireless networks. A UAV collects data from a set of randomly distributed sensors, where both the locations of these sensors and their data volume to be transmitted are unknown to the UAV. In order to assist the UAV in finding the optimal motion trajectory in the face of the uncertainty without the above knowledge whilst aiming for maximizing the cumulative collected data, we formulate a reinforcement learning problem by modelling the motion-trajectory as a Markov decision process with the UAV acting as the learning agent. Then, we propose a pair of novel trajectory optimization algorithms based on stochastic modelling and reinforcement learning, which allows the UAV to optimize its flight trajectory without the need for system identification. More specifically, by dividing the considered region into small tiles, we conceive state-action-reward-state-action (Sarsa) and Q -learning based UAV-trajectory optimization algorithms (i.e., SUTOA and QUTOA) aiming to maximize the cumulative data collected during the finite flight-time. Our simulation results demonstrate that both of the proposed approaches are capable of finding an optimal trajectory under the flight-time constraint. The preference for QUTOA vs. SUTOA depends on the relative position of the start and the end points of the UAVs.
KW - Reinforcement learning
KW - sensor data collection
KW - trajectory optimization
KW - UAV communications
UR - http://www.scopus.com/inward/record.url?scp=85087542818&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3001752
DO - 10.1109/ACCESS.2020.3001752
M3 - Article
AN - SCOPUS:85087542818
SN - 2169-3536
VL - 8
SP - 112253
EP - 112265
JO - IEEE Access
JF - IEEE Access
M1 - 9114970
ER -