Reinforcement-learning based control for nonlinear systems

Student thesis: Doctoral ThesisDoctor of Philosophy


In control theory, the development and analysis of control systems usually have high dependency on the mathematical model of the dynamic system. Variety of conventional control strategies have been proposed based on the linearization of the dynamic models. However, nonlinear systems suffer from obtaining precise mathematical models considering the complex properties of the systems. The inaccurate and complex models significantly increase the difficulty in proper linearization, which leads to the limited application of the above mentioned control techniques. Besides, the control performance can be significantly affected by the variation of system dynamic, parameter uncertainty, measurement error and external disturbance, etc. The above mentioned issues remain the development of control strategies for nonlinear systems a challenging issue. Nevertheless, reinforcement learning (RL) algorithms have less requirement on the knowledge of the dynamic model, of which control policy can be iteratively optimized in the interaction process with environment by learning from either online or offline experience. Therefore, RL algorithms reveal high potential of being an alternative solution to proposing optimal control policy for nonlinear systems.

Considering the limitations revealed in conventional control strategies and the advantages of RL algorithms shown in nonlinear system control, this thesis explores the possibility of developing control systems based on RL algorithms. The main focuses of this research stay on the design and development of optimal control strategies for nonlinear systems by infusing conventional control techniques with RL algorithms, aiming at enhancing the system performance both in transient response and robustness from control perspective and improving the learning efficiency and stability in the training process from RL algorithms perspective. Several hybrid models are proposed in this thesis which apply conventional PID controller or FLS as part of RL learning mechanism. The performance of the control system after training are compared and analysed on nonlinear platforms in simulation environments, where the advantage of the hybrid models are shown both in transient properties and robustness. The main contributions of the thesis can be summarized as three main sections, which are presented as following:

1) The first work is presented in Chapter 3. In this work, an innovative structure of adaptive PID controller based on Q-learning algorithm (Q-PID controller) is proposed, which aims to provide a RL based training scheme for multiple PID controllers in order to improve the transient performance and the adaption of PID controller in complex environments. An adaptive learning rate scheme is applied as an acceleration method in the learning process. The proposed controller is tested on the inverted pendulum system in simulation environment with two comparisons, which are conventional PID controller and the controller simply using Q-learning algorithm, respectively. The simulation results indicate the benefit of proposed Q-PID controller over other two opponents in both generality and stability.

2) The second work is presented in Chapter 4, where an adaptive neuro-fuzzy PID controller, implemented as an actor, based on twin delayed deep deterministic policy gradient (TD3) algorithm is developed, which addresses the training challenge caused by the increased number of parameters in the system. Another linear PID controller based on TD3 algorithm is also provided for comparison purpose. The input values are infused with fuzzy information and a specially designed neuro-fuzzy PID controller is proposed as actor approximator in the RL algorithm. An inverted pendulum simulation environment is provided as to test the performance of the controller after optimization, which shows advantages over the comparison in both generalization and robustness tests.

3) In Chapter 5, the developed approach inherits the research idea presented in Chapter 4 by applying FLS as actor approximator in an off-policy actor-critic algorithm. The proposed method reduces the complexity in optimizing FLS based on RL algorithm especially for IT2 cases as well as enhances the robustness and control performance of the controller after optimization. The type of FLS is extended from type-1 to IT2 with a more flexible learning architecture, of which parameters in both antecedent and consequent are adjustable in the optimization process. The details of the update rules for the actor approximator in the proposed RL algorithm are provided. Two other types of controllers are provided as actor function approximators for comparison purposes which are type-1 fuzzy PD (T1-FPD) controller and neuro-PD controller, respectively. The update rules of T1-FPD controller are also provided as a special case of IT2-FPD. The performance of the proposed controllers are tested on the inverted pendulum system to compare the properties in transient response and robustness. The advantage of the IT2-FPD controller as function approximator over other two comparisons in these tests are verified.
Date of Award1 Aug 2022
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorHongbin Liu (Supervisor) & Hak-Keung Lam (Supervisor)

Cite this