Pushing the Boundaries of Deep Reinforcement Learning by Challenging its Fundamentals

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

Reinforcement learning (RL) holds the promise to provide a general paradigm for solving meaningful real-world problems. However, current RL algorithms rely on large amounts of data and access to informative reward functions, constraining much of their application to easily simulatable and specifiable tasks. Furthermore, without directly prescribing target invariances, learned agent behavior often fails to generalize or transfer even when dealing with minimal variations of the original environments. These traits are shared by a large part of the modern literature, making them likely symptomatic of some inherent limitations of the traditional deep RL framework. Based on these observations, this thesis introduces several new methods that directly challenge some of the field’s current fundamental assumptions and characteristics.

First, we introduce models and practices aimed at supervening some of deep RL’s ubiquitous components: Inflexible models of agent behavior and suboptimal priors for learning generalizable behavior currently bottleneck the expressivity and efficacy of the deep RL paradigm. Our alternatives provide agents with both temporal and computational reasoning flexibility and produce new inductive biases to better capture key features for sequential decision-making. We show the benefits of these new components lead to more efficient algorithms and the natural emergence of new intuitive agent properties.

Second, we introduce new complementary methods aimed at improving the generality of the RL framework: The problem formulation and optimization challenges characterizing RL make domain-specific engineering of reward functions, training objectives, and mod- els crucial for performance. Our new auxiliary components include learning from different forms of simple supervision, together with adaptive strategies to tune objective functions and stabilize backpropagation directly from experience. We show incorporating these advances orthogonally improves performance and lessens reliance on manual algorithm tuning.

The unifying theme behind our contributions is to introduce new dimensions over which the RL framework can autonomously learn generalizable knowledge.
We show this approach not only provides agents with new capabilities and improves performance across a range of diverse problems, but also produces more robust algorithms that are less dependent on domain-specific parameters and architectures. We discuss the implications of our results and related future directions to further bring the RL field closer to its true potential.
Date of Award1 Mar 2024
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorOya Celiktutan Dikici (Supervisor) & Jian Dai (Supervisor)

Cite this

'