Accelerating Learning Robot Manipulation Tasks by Using Curriculum Learning and Task Decomposition

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

Deep reinforcement learning (DRL) has been widely used in solving high-dimensional continuous robotic control tasks. The DRL methods provide a new learning paradigm that enables a robot to learn a policy end-to-end through trial-and-error with little prior knowledge. This dissertation utilizes deep reinforcement learning to solve several fundamental robotic manipulation problems such as target reaching, pick and place, and obstacle avoidance. The curriculum learning strategy is adopted to accelerate the training process of the target reaching task. A novel fuzzy adaptive curriculum learning (FACL) strategy is proposed to generate appropriate curriculums based on the real-time performance of the agent. Furthermore, a novel task decomposition method is leveraged to learn the pick and place strategy. The pick and place task is decomposed into multiple low-level subtasks: approaching, picking, moving, and placing.

The dissertation starts by introducing the application of deep reinforcement learning methods in a range of robotic control tasks. The preliminary knowledge of the reinforcement learning and Markov decision process (MDP) is demonstrated. Several model-free reinforcement learning algorithms such as Q-learning, DQN, and DDPG are presented, followed by curriculum learning and fuzzy inference system (FIS).

In order to solve these robotic tasks using the reinforcement learning paradigm, the problems are formalized as Markov decision processes (MDPs). Two target reaching tasks for a 6-DOF robot manipulator are designed. The first task only considers the position of the end-effector, with the target position randomly selected within a designated 3D workspace. The second task considers both the target’s position and orientation. In the pick and place task, the grasped objects are set as regularly shaped objects such as cuboid boxes to simplify the picking operation of a two-finger gripper. Additionally, a DRL-based controller is proposed to handle the obstacle avoidance task.

The dissertation then analyzes the forward kinematics of the 6-DOF robot manipulator. Based on the kinematic model, a 3D robot simulator is developed to simulate the motion of the robot. A Python interface is set up between the RL agent and the simulator to control the robot manipulator. In addition, the PyBullet physics engine is utilized to build a virtual platform to display the simulation results more intuitively.

The dissertation also introduces several common decay functions utilized in continuous curriculum learning. Curriculum learning is a learning strategy that generates a sequence of appropriate subtasks for the agent to train on to improve learning efficiency and final performance. A fuzzy adaptive curriculum learning (FACL) strategy is proposed to automatically generate a continuous curriculum based on the agent’s real-time performance. Throughout the dissertation, the proposed FACL strategy is proven to improve learning efficiency. Moreover, it is more flexible and adaptable than traditional curriculum learning methods.

Furthermore, this dissertation elaborates on the task decomposition strategy. A novel task decomposition method for the pick and place task is proposed. The pick and place task is decomposed into four low-level subtasks, which can be trained independently. Straightforward and effective pick and place operations are devised for a two-finger gripper. Besides, a high-level controller is designed to instruct the low-level controllers to efficiently accomplish the tasks. The results suggest that the proposed method can substantially improve training efficiency and stability.

Finally, this dissertation introduces the Robot Operating System (ROS) and the control methods utilized for the robots. A physical platform has been developed, featuring a 6-DOF robot manipulator and a two-finger gripper, with its control system based on ROS. The learned policies are transferred to and evaluated on the real-world physical platform. The experiments indicate that the policies learned from simulations can be applied to real robots and achieve satisfactory outcomes.
Date of Award1 Sept 2022
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorJian Dai (Supervisor)

Cite this

'