King's College London

Research portal

Full-Body Postural Control of a Humanoid Robot with Both Imitation Learning and Skill Innovation

Research output: Contribution to journalArticle

Standard

Full-Body Postural Control of a Humanoid Robot with Both Imitation Learning and Skill Innovation. / González-Fierro, Miguel ; Balaguer, Carlos ; Swann, Nicola ; Nanayakkara, Thrishantha.

In: International Journal Of Humanoid Robotics, Vol. 11, No. 02, 1450012, 06.2014.

Research output: Contribution to journalArticle

Harvard

González-Fierro, M, Balaguer, C, Swann, N & Nanayakkara, T 2014, 'Full-Body Postural Control of a Humanoid Robot with Both Imitation Learning and Skill Innovation', International Journal Of Humanoid Robotics, vol. 11, no. 02, 1450012. https://doi.org/10.1142/S0219843614500121

APA

González-Fierro, M., Balaguer, C., Swann, N., & Nanayakkara, T. (2014). Full-Body Postural Control of a Humanoid Robot with Both Imitation Learning and Skill Innovation. International Journal Of Humanoid Robotics, 11(02), [1450012]. https://doi.org/10.1142/S0219843614500121

Vancouver

González-Fierro M, Balaguer C, Swann N, Nanayakkara T. Full-Body Postural Control of a Humanoid Robot with Both Imitation Learning and Skill Innovation. International Journal Of Humanoid Robotics. 2014 Jun;11(02). 1450012. https://doi.org/10.1142/S0219843614500121

Author

González-Fierro, Miguel ; Balaguer, Carlos ; Swann, Nicola ; Nanayakkara, Thrishantha. / Full-Body Postural Control of a Humanoid Robot with Both Imitation Learning and Skill Innovation. In: International Journal Of Humanoid Robotics. 2014 ; Vol. 11, No. 02.

Bibtex Download

@article{5c7a1c20cd604720a24e8eb8ef6ea164,
title = "Full-Body Postural Control of a Humanoid Robot with Both Imitation Learning and Skill Innovation",
abstract = "In this paper, we present a novel methodology to obtain imitative and innovative postural movements in a humanoid based on human demonstrations in a different kinematic scale. We collected motion data from a group of human participants standing up from a chair. Modeling the human as an actuated 3-link kinematic chain, and by defining a multi-objective reward function of zero moment point and joint torques to represent the stability and effort, we computed reward profiles for each demonstration. Since individual reward profiles show variability across demonstrating trials, the underlying state transition probabilities were modeled using a Markov chain. Based on the argument that the reward profiles of the robot should show the same temporal structure of those of the human, we used differential evolution to compute a trajectory that fits all humanoid constraints and minimizes the difference between the robot reward profile and the predicted profile if the robot imitates the human. Therefore, robotic imitation involves developing a policy that results in a temporal reward structure, matching that of a group of human demonstrators across an array of demonstrations. Skill innovation was achieved by optimizing a signed reward error after imitation was achieved. Experimental results using the humanoid HOAP-3 are shown.",
keywords = "Humanoid walking, learning based on demonstrations",
author = "Miguel Gonz{\'a}lez-Fierro and Carlos Balaguer and Nicola Swann and Thrishantha Nanayakkara",
year = "2014",
month = jun,
doi = "10.1142/S0219843614500121",
language = "English",
volume = "11",
journal = "International Journal Of Humanoid Robotics",
issn = "0219-8436",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "02",

}

RIS (suitable for import to EndNote) Download

TY - JOUR

T1 - Full-Body Postural Control of a Humanoid Robot with Both Imitation Learning and Skill Innovation

AU - González-Fierro, Miguel

AU - Balaguer, Carlos

AU - Swann, Nicola

AU - Nanayakkara, Thrishantha

PY - 2014/6

Y1 - 2014/6

N2 - In this paper, we present a novel methodology to obtain imitative and innovative postural movements in a humanoid based on human demonstrations in a different kinematic scale. We collected motion data from a group of human participants standing up from a chair. Modeling the human as an actuated 3-link kinematic chain, and by defining a multi-objective reward function of zero moment point and joint torques to represent the stability and effort, we computed reward profiles for each demonstration. Since individual reward profiles show variability across demonstrating trials, the underlying state transition probabilities were modeled using a Markov chain. Based on the argument that the reward profiles of the robot should show the same temporal structure of those of the human, we used differential evolution to compute a trajectory that fits all humanoid constraints and minimizes the difference between the robot reward profile and the predicted profile if the robot imitates the human. Therefore, robotic imitation involves developing a policy that results in a temporal reward structure, matching that of a group of human demonstrators across an array of demonstrations. Skill innovation was achieved by optimizing a signed reward error after imitation was achieved. Experimental results using the humanoid HOAP-3 are shown.

AB - In this paper, we present a novel methodology to obtain imitative and innovative postural movements in a humanoid based on human demonstrations in a different kinematic scale. We collected motion data from a group of human participants standing up from a chair. Modeling the human as an actuated 3-link kinematic chain, and by defining a multi-objective reward function of zero moment point and joint torques to represent the stability and effort, we computed reward profiles for each demonstration. Since individual reward profiles show variability across demonstrating trials, the underlying state transition probabilities were modeled using a Markov chain. Based on the argument that the reward profiles of the robot should show the same temporal structure of those of the human, we used differential evolution to compute a trajectory that fits all humanoid constraints and minimizes the difference between the robot reward profile and the predicted profile if the robot imitates the human. Therefore, robotic imitation involves developing a policy that results in a temporal reward structure, matching that of a group of human demonstrators across an array of demonstrations. Skill innovation was achieved by optimizing a signed reward error after imitation was achieved. Experimental results using the humanoid HOAP-3 are shown.

KW - Humanoid walking

KW - learning based on demonstrations

U2 - 10.1142/S0219843614500121

DO - 10.1142/S0219843614500121

M3 - Article

VL - 11

JO - International Journal Of Humanoid Robotics

JF - International Journal Of Humanoid Robotics

SN - 0219-8436

IS - 02

M1 - 1450012

ER -

View graph of relations

© 2018 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454