Abstract
We contribute a method for improving the skill execution performance of a robot by complementing an existing algorithmic solution with corrective human demonstration. We apply the proposed method to the biped walking problem, which is a good example of a complex low level skill due to the complicated dynamics of the walk process in a high dimensional state and action space. We introduce an incremental learning approach to improve the Nao humanoid robot’s stability during walking. First, we identify, extract, and record a complete walk cycle from the motion of the robot as it executes a given walk algorithm as a black box. Second, we apply offline advice operators for improving the stability of the learned open-loop walk cycle. Finally, we present an algorithm to directly modify the recorded walk cycle using real time corrective human demonstration. The demonstrator delivers the corrective feedback using a commercially available wireless game controller without touching the robot. Through the proposed algorithm, the robot learns a closed-loop correction policy for the open-loop walk by mapping the corrective demonstrations to the sensory readings received while walking. Experiment results demonstrate a significant improvement in the walk stability.
Similar content being viewed by others
Notes
The Nao V3 model is delivered with a new closed-loop omni-directional walk; however, that model was not available to us during our experimental study.
The mentioned frequency is for the Nao V2 model which was the platform used in this study. The internal control software on the more recent V3 model runs at 100 Hz.
The open-loop walk performance was comparable to the performance of the original ZMP-based walk, which was not available to be accounted for in this empirical comparison.
References
Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on machine learning. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.6759.
Aldebaran (2008). Aldebaran robotics—Nao Humanoid Robot. http://www.aldebaran-robotics.com/pageProjetsNao.php.
Argall, B., Browning, B., & Veloso, M. (2007). Learning from demonstration with the critique of a human teacher. In Second annual conference on human-robot interactions (HRI’07).
Argall, B., Browning, B., & Veloso, M. (2008). Learning robot motion control with demonstration and advice-operators. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS’08).
Argall, B., Sauser, E., & Billard, A. (2010). Tactile feedback for policy refinement and reuse. In Proceedings of the 9th IEEE international conference on development and learning (ICDL’10). Ann Arbor, MI, August 2010.
Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Automation Systems, 57(5), 469–483. doi:10.1016/j.robot.2008.10.024
Atkeson, C., Moore, A., & Schaal, S. (1997). Locally weighted learning. AI Review, 11, 11–73.
Atkeson, C. G., & Schaal, S. (1997a). Learning tasks from a single demonstration. In: IEEE international conference on robotics and automation (ICRA97), Piscataway: IEEE, pp. 1706–1712, http://www-clmc.usc.edu/publications/A/atkeson-ICRA1997.pdf.
Atkeson, C. G., & Schaal, S. (1997b). Robot learning from demonstration. In: machine learning: proceedings of the fourteenth international conference (ICML’97), San Mateo: Morgan Kaufmann, pp. 12–20, http://www-clmc.usc.edu/publications/A/atkeson-ICML1997.pdf.
Bentivegna, D. C., Atkeson, C. G., & Cheng, G. (2006). Learning similar tasks from observation and practice. In: Proceedings of the 2006 IEEE/RSJ international conference on intelligent robots and systems, pp. 2677–2683.
Breazeal, C., Hoffman, G., & Lockerd, A. (2004). Teaching and working with robots as a collaboration. In AAMAS ’04: Proceedings of the third international joint conference on autonomous agents and multiagent systems (pp. 1030–1037). Washington: IEEE Computer Society, doi:10.1109/AAMAS.2004.258.
Chernova, S., & Veloso, M. (2008). Teaching collaborative multirobot tasks through demonstration. In Proceedings of AAMAS’08, the seventh international joint conference on autonomous agents and multi-agent systems, Estoril, Portugal, May 2008.
Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research 34, 1–25.
Czarnetzki, S., Kerner, S., & Urbann, O. (2009). Observer-based dynamic walking control for biped robots. Robotics and Autonomous Systems, 57, 839–845.
Gokce, B., & Akin, H. L. (2009). Parameter optimization of a signal-based biped locomotion approach using evolutionary strategies. In O. Tosun, H. L. Akin, M. O. Tokhi, & G. S. Virk (Eds.), Proceedings of the twelfth international conference on climbing and walking robots and the support technologies for mobile machines (CLAWAR 2009), Istanbul, Turkey, 9–11 September 2009. Singapore: World Scientific.
Gouaillier, D., Hugel, V., Blazevic, P., Kilner, C., Monceaux, J., Marnier, P.L., Serre, B., & Maisonnier, J. B. (2009). Mechatronic design of Nao humanoid. In ICRA, pp. 769–774.
Graf, C., Härtl, A., Röfer, T., & Laue, T. (2009). A robust closed-loop gait for the standard platform league humanoid. In C. Zhou, E. Pagello, E. Menegatti, S. Behnke, & T. Röfer (Eds.), Proceedings of the fourth workshop on humanoid soccer robots in conjunction with the 2009 IEEE-RAS international conference on humanoid robots (pp. 30–37). France: Paris.
Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics 24(6), 1463–1467.
Kolter, J. Z., Abbeel, P., & Ng, A. Y. (2007). Hierarchical apprenticeship learning with application to quadruped locomotion. In: Platt, J. C., Koller, D., Singer, Y., & Roweis, S. T. (Eds.), Advances in neural information processing systems 20, Proceedings of the twenty-first annual conference on neural information processing systems, Vancouver, British Columbia, Canada, 3–6 December 2007. Cambridge: MIT Press. doi:10.1007/s10514-012-9284-1.
Liu, J., & Veloso, M. (2008). Online ZMP sampling search for biped walking planning. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS’08), Nice, France, September 2008.
Liu, J., Chen, X., & Veloso, M. (2009). Simplified walking: a new way to generate flexible biped patterns. In O. Tosun, H. L. Akin, M. O. Tokhi, & G. S. Virk (Eds.), Proceedings of the twelfth international conference on climbing and walking robots and the support technologies for mobile machines (CLAWAR 2009), Istanbul, Turkey, 9–11 September 2009. Singapore: World Scientific.
Nakanishi, J. (2004). Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems, 47(2–3), 79–91.
Nintendo (2007). Nintendo—WII game controllers. http://www.nintendo.com/wii/what/controllers.
RoboCup (2009). RoboCup international robot soccer competition. http://www.robocup.org.
RoboCup SPL (2009). The RoboCup standard platform league. http://www.tzi.de/spl.
Röfer, T., Laue, T., Müller, J., Bösche, O., Burchardt, A., Damrose, E., Gillmann, K., Graf, C., de Haas, T. J., Härtl, A., Rieskamp, A., Schreck, A., Sieverdingbeck, I., & Worch, J. H. (2009). B-human 2009 team report. Tech. rep., DFKI Lab and University of Bremen, Bremen, Germany. http://www.b-human.de/download.php?file=coderelease09_doc.
Rybski, P. E., Yoon, K., Stolarz, J., & Veloso, M. M. (2007). Interactive robot task training through dialog and demonstration. In Proceedings of the ACM/IEEE international conference on human-robot interaction, Washington, DC, pp. 255–262.
Strom, J., Slavov, G., & Chown, E. (2009). Omnidirectional walking using ZMP and preview control for the Nao humanoid robot. In RoboCup 2009: robot soccer world cup XIII.
Thomaz, A. L., & Breazeal, C. (2006). Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance. In AAAI’06: Proceedings of the 21st national conference on artificial intelligence, Menlo Park: AAAI Press, pp. 1000–1005.
Acknowledgements
The authors would like to thank Tekin Meriçli and Brian Coltin for their valuable feedback on the manuscript. We further thank the Cerberus team for their debugging system, and the CMWrEagle team for their ZMP-based robot walk.
The first author is supported by The Scientific and Technological Research Council of Turkey Programme 2214 and the Turkish State Planning Organization (DPT) under the TAM Project, number 2007K120610.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Meriçli, Ç., Veloso, M. & Akın, H.L. Improving biped walk stability with complementary corrective demonstration. Auton Robot 32, 419–432 (2012). https://doi.org/10.1007/s10514-012-9284-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-012-9284-1