Skip to main content
Log in

Efficient behavior learning in human–robot collaboration

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

We present a novel method for a robot to interactively learn, while executing, a joint human–robot task. We consider collaborative tasks realized by a team of a human operator and a robot helper that adapts to the human’s task execution preferences. Different human operators can have different abilities, experiences, and personal preferences so that a particular allocation of activities in the team is preferred over another. Our main goal is to have the robot learn the task and the preferences of the user to provide a more efficient and acceptable joint task execution. We cast concurrent multi-agent collaboration as a semi-Markov decision process and show how to model the team behavior and learn the expected robot behavior. We further propose an interactive learning framework and we evaluate it both in simulation and on a real robotic setup to show the system can effectively learn and adapt to human expectations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. We sometimes use the term policy to refer to a deterministic mapping from S to A.

References

  • Akrour, R., Schoenauer, M., & Sebag, M. (2011). Preference-based policy learning. In ECML/PKDD Springer.

  • Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1), 285–297.

    Article  MathSciNet  MATH  Google Scholar 

  • Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34(1), 1.

    MathSciNet  MATH  Google Scholar 

  • Džeroski, S., De Raedt, L., & Driessens, K. (2001). Relational reinforcement learning. Machine Learning, 43(1–2), 7–52.

    Article  MATH  Google Scholar 

  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. In Annals of statistics (pp. 1189–1232).

  • Grollman, D. H, & Jenkins, O. C. (2007) Dogged learning for robots. In ICRA.

  • Jain, A., Wojcik, B., Joachims, T., & Saxena, A. (2013). Learning trajectory preferences for manipulators via iterative improvement. In NIPS.

  • Kersting, K., Otterlo, M. V., & De Raedt, L. (2004). Bellman goes relational. In ICML.

  • Knox, W. B., Stone, P., & Breazeal, C. (2013). Training a robot via human feedback: A case study. In ICSR.

  • Koppula, H. S., Jain, A., & Saxena, A. (2016). Anticipatory planning for human–robot teams. In ISER.

  • Lang, T., & Toussaint, M. (2010). Planning with noisy probabilistic relational rules. Journal of Artificial Intelligence Research, 39(1), 1–49.

    MATH  Google Scholar 

  • Lee, M. K., Forlizzi, J., Kiesler, S., Rybski, P., Antanitis, J., & Savetsila, S. (2012). Personalization in HRI: A longitudinal field experiment. In HRI.

  • Lopes, M., Melo, F., & Montesano, L. (2009). Active learning for reward estimation in inverse reinforcement learning. In ECML/PKDD.

  • Marek, V., & Truszczyński, W. (1999) Stable models and an alternative logic programming paradigm. In The logic programming paradigm: A 25-year perspective.

  • Mason, M., & Lopes, M. (2011) Robot self-initiative and personalization by learning through repeated interactions. In HRI.

  • Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., & Hagita, N. (2008). Adapting robot behavior for human–robot interaction. Transactions on Robotics, 24(4), 911–916.

    Article  Google Scholar 

  • Munzer, T., Piot, B., Geist, M., Pietquin, O., & Lopes, M. (2015). Inverse reinforcement learning in relational domains. In IJCAI.

  • Natarajan, S., Joshi, S., Tadepalli, P., Kersting, K., & Shavlik, J. (2011). Imitation learning in relational domains: A functional-gradient boosting approach. In IJCAI.

  • Nikolaidis, S., & Shah, J. (2013). Human–robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy. In HRI.

  • Nikolaidis, S., Gu, K., Ramakrishnan, R., & Shah, J. (2014). Efficient model learning for human–robot collaborative tasks. arXiv preprint arXiv:1405.6341.

  • Rohanimanesh, K., & Mahadevan, S. (2005) Coarticulation: An approach for generating concurrent plans in markov decision processes. In ICML.

  • Shivaswamy, P. K., & Joachims, T. (2012). Online structured prediction via coactive learning. In ICML.

  • Toussaint, M., Munzer, T., Mollard, Y., Wu, L. Y., Vien, N. A., & Lopes, M. (2016). Relational activity processes for modeling concurrent cooperation. In ICRA.

Download references

Acknowledgements

This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013 and by the EU FP7-ICT project 3rdHand under Grant Agreement No. 610878.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel Lopes.

Additional information

This is one of the several papers published in Autonomous Robots comprising the Special Issue on Learning for Human-Robot Collaboration.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Munzer, T., Toussaint, M. & Lopes, M. Efficient behavior learning in human–robot collaboration. Auton Robot 42, 1103–1115 (2018). https://doi.org/10.1007/s10514-017-9674-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-017-9674-5

Keywords

Navigation