Locally weighted least squares policy iteration for model-free learning in uncertain environments

Matthew Howard, Yoshihiko Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference paper

1 Citation (Scopus)

Abstract

This paper introduces Locally Weighted Least Squares Policy Iteration for learning approximate optimal control in settings where models of the dynamics and cost function are either unavailable or hard to obtain. Building on recent advances in Least Squares Temporal Difference Learning, the proposed approach is able to learn from data collected from interactions with a system, in order to build a global control policy based on localised models of the state-action value function. Evaluations are reported characterising learning performance for non-linear control problems including an under-powered pendulum swing-up task, and a robotic door-opening problem under different dynamical conditions.

Original languageEnglish
Title of host publication2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
PublisherIEEE
Pages1223-1229
Number of pages7
ISBN (Print)9781467363587
DOIs
Publication statusPublished - 1 Dec 2013
Event2013 26th IEEE/RSJ International Conference on Intelligent Robots and Systems: New Horizon, IROS 2013 - Tokyo, United Kingdom
Duration: 3 Nov 20138 Nov 2013

Conference

Conference2013 26th IEEE/RSJ International Conference on Intelligent Robots and Systems: New Horizon, IROS 2013
Country/TerritoryUnited Kingdom
CityTokyo
Period3/11/20138/11/2013

Fingerprint

Dive into the research topics of 'Locally weighted least squares policy iteration for model-free learning in uncertain environments'. Together they form a unique fingerprint.

Cite this