Abstract
We present a novel method for a robot to interactively learn, while executing, a joint human–robot task. We consider collaborative tasks realized by a team of a human operator and a robot helper that adapts to the human’s task execution preferences. Different human operators can have different abilities, experiences, and personal preferences so that a particular allocation of activities in the team is preferred over another. Our main goal is to have the robot learn the task and the preferences of the user to provide a more efficient and acceptable joint task execution. We cast concurrent multi-agent collaboration as a semi-Markov decision process and show how to model the team behavior and learn the expected robot behavior. We further propose an interactive learning framework and we evaluate it both in simulation and on a real robotic setup to show the system can effectively learn and adapt to human expectations.
Similar content being viewed by others
Notes
We sometimes use the term policy to refer to a deterministic mapping from S to A.
References
Akrour, R., Schoenauer, M., & Sebag, M. (2011). Preference-based policy learning. In ECML/PKDD Springer.
Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1), 285–297.
Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34(1), 1.
Džeroski, S., De Raedt, L., & Driessens, K. (2001). Relational reinforcement learning. Machine Learning, 43(1–2), 7–52.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. In Annals of statistics (pp. 1189–1232).
Grollman, D. H, & Jenkins, O. C. (2007) Dogged learning for robots. In ICRA.
Jain, A., Wojcik, B., Joachims, T., & Saxena, A. (2013). Learning trajectory preferences for manipulators via iterative improvement. In NIPS.
Kersting, K., Otterlo, M. V., & De Raedt, L. (2004). Bellman goes relational. In ICML.
Knox, W. B., Stone, P., & Breazeal, C. (2013). Training a robot via human feedback: A case study. In ICSR.
Koppula, H. S., Jain, A., & Saxena, A. (2016). Anticipatory planning for human–robot teams. In ISER.
Lang, T., & Toussaint, M. (2010). Planning with noisy probabilistic relational rules. Journal of Artificial Intelligence Research, 39(1), 1–49.
Lee, M. K., Forlizzi, J., Kiesler, S., Rybski, P., Antanitis, J., & Savetsila, S. (2012). Personalization in HRI: A longitudinal field experiment. In HRI.
Lopes, M., Melo, F., & Montesano, L. (2009). Active learning for reward estimation in inverse reinforcement learning. In ECML/PKDD.
Marek, V., & Truszczyński, W. (1999) Stable models and an alternative logic programming paradigm. In The logic programming paradigm: A 25-year perspective.
Mason, M., & Lopes, M. (2011) Robot self-initiative and personalization by learning through repeated interactions. In HRI.
Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., & Hagita, N. (2008). Adapting robot behavior for human–robot interaction. Transactions on Robotics, 24(4), 911–916.
Munzer, T., Piot, B., Geist, M., Pietquin, O., & Lopes, M. (2015). Inverse reinforcement learning in relational domains. In IJCAI.
Natarajan, S., Joshi, S., Tadepalli, P., Kersting, K., & Shavlik, J. (2011). Imitation learning in relational domains: A functional-gradient boosting approach. In IJCAI.
Nikolaidis, S., & Shah, J. (2013). Human–robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy. In HRI.
Nikolaidis, S., Gu, K., Ramakrishnan, R., & Shah, J. (2014). Efficient model learning for human–robot collaborative tasks. arXiv preprint arXiv:1405.6341.
Rohanimanesh, K., & Mahadevan, S. (2005) Coarticulation: An approach for generating concurrent plans in markov decision processes. In ICML.
Shivaswamy, P. K., & Joachims, T. (2012). Online structured prediction via coactive learning. In ICML.
Toussaint, M., Munzer, T., Mollard, Y., Wu, L. Y., Vien, N. A., & Lopes, M. (2016). Relational activity processes for modeling concurrent cooperation. In ICRA.
Acknowledgements
This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013 and by the EU FP7-ICT project 3rdHand under Grant Agreement No. 610878.
Author information
Authors and Affiliations
Corresponding author
Additional information
This is one of the several papers published in Autonomous Robots comprising the Special Issue on Learning for Human-Robot Collaboration.
Rights and permissions
About this article
Cite this article
Munzer, T., Toussaint, M. & Lopes, M. Efficient behavior learning in human–robot collaboration. Auton Robot 42, 1103–1115 (2018). https://doi.org/10.1007/s10514-017-9674-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-017-9674-5