Policy Feedback for the Refinement of Learned Motion Control on a Mobile Robot

Argall, Brenna D.; Browning, Brett; Veloso, Manuela M.

doi:10.1007/s12369-012-0156-9

Policy Feedback for the Refinement of Learned Motion Control on a Mobile Robot

Published: 04 July 2012

Volume 4, pages 383–395, (2012)
Cite this article

International Journal of Social Robotics Aims and scope Submit manuscript

Brenna D. Argall¹,
Brett Browning² &
Manuela M. Veloso³

314 Accesses
3 Citations
Explore all metrics

Abstract

Motion control is fundamental to mobile robots, and the associated challenge in development can be assisted by the incorporation of execution experience to increase policy robustness. In this work, we present an approach that updates a policy learned from demonstration with human teacher feedback. We contribute advice-operators as a feedback form that provides corrections on state-action pairs produced during a learner execution, and Focused Feedback for Mobile Robot Policies (F3MRP) as a framework for providing feedback to rapidly-sampled policies. Both are appropriate for mobile robot motion control domains. We present a general feedback algorithm in which multiple types of feedback, including advice-operators, are provided through the F3MRP framework, and shown to improve policies initially derived from a set of behavior examples. A comparison to providing more behavior examples instead of more feedback finds data to be generated in different areas of the state and action spaces, and feedback to be more effective at improving policy performance while producing smaller datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from Humans

Robot life-long task learning from human demonstrations: a Bayesian approach

Article 28 July 2016

Probability-Weighted Temporal Registration for Improving Robot Motion Planning and Control Learned from Demonstrations

Notes

The F3MRP framework was developed within the GNU Octave scientific language [14].
The empirical validations of Sect. 4.2 employ lazy learning regression techniques [6]; specifically, a form of locally weighted averaging. Incremental policy updating is particularly straightforward under lazy learning regression, since explicit rederivation is not required; policy derivation happens at execution time and so a complete policy update is accomplished by simply adding new data to the set.
The positive credit flag adds the execution point, unmodified, to the dataset; and thus may equivalently be viewed as an identity function advice-operator, i.e. f(z,a)=(z,a).
This scale becomes finer, and association with the underlying data trickier, if a single value is intended to be somehow distributed across only a portion of the execution states; akin to the RL issue of reward back-propagation.
A Poisson formulation was chosen since the distance calculations never fall below, and often cluster near, zero. To estimate λ, frequency counts were computed for k bins (uniformly sized) of distance data (k=50).
The traces ξ _d and ξ _p correspond respectively to the “Prediction Data” and “Position Data” in Fig. 1. Similarly, the trace subsets \(\hat{\xi}_{d}=\nobreak\{x,y,\theta\}_{\varPhi}\) and \(\hat{\xi}_{p} =\{\mathbf{z},\mathbf{a}\}_{\varPhi}\).
Here an earlier version of F3MRP was employed, that did not provide visual dataset support or interactive tagging.
The same teacher (one of the authors) was used to provide both demonstration and feedback.
Full domain, and algorithm, details may be found in [4].
The exceptions being when the entire learner execution receives a correction, or when the teacher provides a demonstration for only the beginning portion of an execution.
In Table 2, operators 0–5 are the baseline operators and operators 6–8 were built through operator-scaffolding.
Note that operator composition is not transitive.
The limit being the number of unique combinations of the parameters of the child operators.
If a constant value for the rate of change in action dimension j is not defined for the robot system, reasonable options for this value include, for example, average rate of change seen during the demonstrations.
The value γ _j,max is defined either by the physical constraints of the robot, or artificially by the control system.

References

Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of advances in neural information processing
Google Scholar
Argall B, Browning B, Veloso M (2008) Learning robot motion control with demonstration and advice-operators. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems
Google Scholar
Argall B, Browning B, Veloso M (2009) Automatic weight learning for multiple data sources when learning from demonstration. In: Proceedings of the IEEE international conference on robotics and automation
Google Scholar
Argall B, Browning B, Veloso M (2011) Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot. Robot Auton Syst 59(3–4):243–255
Article Google Scholar
Argall B, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483
Article Google Scholar
Atkeson CG, Moore AW, Schaal S (1997) Locally weighted learning. Artif Intell Rev 11:11–73
Article Google Scholar
Atkeson CG, Schaal S (1997) Robot learning from demonstration. In: Proceedings of the fourteenth international conference on machine learning (ICML’97)
Google Scholar
Bagnell JA, Schneider JG (2001) Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings of the IEEE international conference on robotics and automation
Google Scholar
Bentivegna DC (2004) Learning from observation using primitives. Ph.D. thesis, College of Computing, Georgia Institute of Technology, Atlanta, GA
Billard A, Callinon S, Dillmann R, Schaal S (2008) Robot programming by demonstration. In: Siciliano B, Khatib O (eds) Handbook of robotics. Springer, New York, Chap. 59
Google Scholar
Breazeal C, Scassellati B (2002) Robots that imitate humans. Trends Cogn Sci 6(11):481–487
Article Google Scholar
Calinon S, Billard A (2007) Incremental learning of gestures by imitation in a humanoid robot. In: Proceedings of the 2nd ACM/IEEE international conference on human-robot interactions
Google Scholar
Chernova S, Veloso M (2008) Learning equivalent action choices from demonstration. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems
Google Scholar
Eaton JW (2002) GNU Octave Manual. Network Theory Limited
Grollman DH, Jenkins OC (2007) Dogged learning for robots. In: Proceedings of the IEEE international conference on robotics and automation
Google Scholar
Ijspeert AJ, Nakanishi J, Schaal S (2002) Learning rhythmic movements by demonstration using nonlinear oscillators. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems
Google Scholar
Kober J, Peters J (2009) Learning motor primitives for robotics. In: Proceedings of the IEEE international conference on robotics and automation
Google Scholar
Kolter JZ, Abbeel P, Ng AY (2008) Hierarchical apprenticeship learning with application to quadruped locomotion. In: Proceedings of advances in neural information processing
Google Scholar
Matarić MJ (2002) Sensory-motor primitives as a basis for learning by imitation: Linking perception to action and biology to robotics. In: Dautenhahn K, Nehaniv CL (eds) Imitation in animals and artifacts. MIT Press, Cambridge, Chap. 15
Google Scholar
Nehaniv CL, Dautenhahn K (2002) The correspondence problem. In: Dautenhahn K, Nehaniv CL (eds) Imitation in animals and artifacts. MIT Press, Cambridge, Chap. 2
Google Scholar
Nicolescu M, Mataric M (2003) Methods for robot task learning: Demonstrations, generalization and practice. In: Proceedings of the second international joint conference on autonomous agents and multi-agent systems
Google Scholar
Pastor P, Kalakrishnan M, Chitta S, Theodorou E, Schaal S (2011) Skill learning and task outcome prediction for manipulation. In: Proceedings of IEEE international conference on robotics and automation
Google Scholar
Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7–9):1180–1190
Article Google Scholar
Ratliff N, Bradley D, Bagnell JA, Chestnutt J (2007) Boosting structured prediction for imitation learning. In: Proceedings of advances in neural information processing systems
Google Scholar
Smart WD (2002) Making reinforcement learning work on real robots. Ph.D. thesis, Department of Computer Science, Brown University, Providence, RI

Download references

Acknowledgements

The research is partly sponsored by the Boeing Corporation under Grant No. CMU-BA-GTA-1, BBNT Solutions under subcontract No. 950008572, via prime Air Force contract No. SA-8650-06-C-7606, the United States Department of the Interior under Grant No. NBCH-1040007 and the Qatar Foundation for Education, Science and Community Development. The views and conclusions contained in this document are solely those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity.

Author information

Authors and Affiliations

Depts. of Electrical Engineering & Computer Science and Physical Medicine & Rehabilitation, Northwestern University, 2145 Sheridan Road, Evanston, IL, 60208, USA
Brenna D. Argall
The Robotics Institute, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA
Brett Browning
Computer Science Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA
Manuela M. Veloso

Authors

Brenna D. Argall
View author publications
You can also search for this author in PubMed Google Scholar
Brett Browning
View author publications
You can also search for this author in PubMed Google Scholar
Manuela M. Veloso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brenna D. Argall.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Argall, B.D., Browning, B. & Veloso, M.M. Policy Feedback for the Refinement of Learned Motion Control on a Mobile Robot. Int J of Soc Robotics 4, 383–395 (2012). https://doi.org/10.1007/s12369-012-0156-9

Download citation

Accepted: 13 June 2012
Published: 04 July 2012
Issue Date: November 2012
DOI: https://doi.org/10.1007/s12369-012-0156-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Policy Feedback for the Refinement of Learned Motion Control on a Mobile Robot

Abstract

Access this article

Similar content being viewed by others

Learning from Humans

Robot life-long task learning from human demonstrations: a Bayesian approach

Probability-Weighted Temporal Registration for Improving Robot Motion Planning and Control Learned from Demonstrations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Policy Feedback for the Refinement of Learned Motion Control on a Mobile Robot

Abstract

Access this article

Similar content being viewed by others

Learning from Humans

Robot life-long task learning from human demonstrations: a Bayesian approach

Probability-Weighted Temporal Registration for Improving Robot Motion Planning and Control Learned from Demonstrations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation