skip to main content
10.1145/3610977.3634947acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
research-article
Open Access

Online Behavior Modification for Expressive User Control of RL-Trained Robots

Published:11 March 2024Publication History

ABSTRACT

Reinforcement Learning (RL) is an effective method for robots to learn tasks. However, in typical RL, end-users have little to no control over how the robot does the task after the robot has been deployed. To address this, we introduce the idea of online behavior modification, a paradigm in which users have control over behavior features of a robot in real-time as it autonomously completes a task using an RL-trained policy. To show the value of this user-centered formulation for human-robot interaction, we present a behavior-diversity--based algorithm, Adjustable Control Of RL Dynamics (ACORD), and demonstrate its applicability to online behavior modification in simulation and a user study. In the study (n =23), users adjust the style of paintings as a robot traces a shape autonomously. We compare \algoshort to RL and Shared Autonomy (SA), and show \algoshort affords user-preferred levels of control and expression, comparable to SA, but with the potential for autonomous execution and robustness of RL. The code for this paper is available at https://github.com/AABL-Lab/HRI2024_ACORD

Skip Supplemental Material Section

Supplemental Material

References

  1. Mohammed Alshiekh, Roderick Bloem, Ruediger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2017. Safe Reinforcement Learning via Shielding. https://doi.org/10.48550/arXiv.1708.08611 arXiv:1708.08611 [cs].Google ScholarGoogle ScholarCross RefCross Ref
  2. Eitan Altman. 2021. Constrained Markov Decision Processes: Stochastic Modeling 1 ed.). Routledge, Boca Raton. https://doi.org/10.1201/9781315140223Google ScholarGoogle ScholarCross RefCross Ref
  3. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. 2018. Hindsight Experience Replay. http://arxiv.org/abs/1707.01495 arXiv:1707.01495 [cs].Google ScholarGoogle Scholar
  4. Christian Arzate Cruz and Takeo Igarashi. 2020. A Survey on Interactive Reinforcement Learning: Design Principles and Open Challenges. In Proceedings of the 2020 ACM Designing Interactive Systems Conference. ACM, Eindhoven Netherlands, 1195--1209. https://doi.org/10.1145/3357236.3395525Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chandrayee Basu, Mukesh Singhal, and Anca D. Dragan. 2018. Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction. 132--140. https://doi.org/10.1145/3171221.3171284 arXiv:1802.01604 [cs].Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Erdem Biyik, Nicolas Huynh, Mykel Kochenderfer, and Dorsa Sadigh. 2020. Active Preference-Based Gaussian Process Regression for Reward Learning. In Robotics: Science and Systems XVI. Robotics: Science and Systems Foundation. https://doi.org/10.15607/RSS.2020.XVI.041Google ScholarGoogle ScholarCross RefCross Ref
  7. Andreea Bobu, Marius Wiggert, Claire Tomlin, and Anca D. Dragan. 2021. Feature Expansive Reward Learning: Rethinking Human Input. Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction (March 2021), 216--224. https://doi.org/10.1145/3434073.3444667 arXiv: 2006.13208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jake Brawer, Debasmita Ghose, Kate Candon, Meiying Qin, Alessandro Roncone, Marynel Vázquez, and Brian Scassellati. 2023. Interactive Policy Shaping for Human-Robot Collaboration with Transparent Matrix Overlays. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. ACM, Stockholm Sweden, 525--533. https://doi.org/10.1145/3568162.3576983Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. http://arxiv.org/abs/1606.01540 arXiv:1606.01540 [cs].Google ScholarGoogle Scholar
  10. Erdem Biyik, Dylan P. Losey, Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, and Dorsa Sadigh. 2022. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research, Vol. 41, 1 (Jan. 2022), 45--67. https://doi.org/10.1177/02783649211041652 Publisher: SAGE Publications Ltd STM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Elliot Chane-Sane, Cordelia Schmid, and Ivan Laptev. 2021. Goal-Conditioned Reinforcement Learning with Imagined Subgoals. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 1430--1440. https://proceedings.mlr.press/v139/chane-sane21a.html ISSN: 2640--3498.Google ScholarGoogle Scholar
  12. Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. http://arxiv.org/abs/1706.03741 Number: arXiv:1706.03741 arXiv:1706.03741 [cs, stat].Google ScholarGoogle Scholar
  13. Matei Ciocarlie, Kaijen Hsiao, Adam Leeper, and David Gossow. 2012. Mobile manipulation through an assistive home robot. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 5313--5320. https://doi.org/10.1109/IROS.2012.6385907 ISSN: 2153-0866.Google ScholarGoogle ScholarCross RefCross Ref
  14. Antoine Cully. 2019. Autonomous skill discovery with quality-diversity and unsupervised descriptors. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Prague Czech Republic, 81--89. https://doi.org/10.1145/3321707.3321804Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is All You Need: Learning Skills without a Reward Function. http://arxiv.org/abs/1802.06070 Number: arXiv:1802.06070 arXiv:1802.06070 [cs].Google ScholarGoogle Scholar
  16. Benjamin Eysenbach, Tianjun Zhang, Sergey Levine, and Russ R. Salakhutdinov. 2022. Contrastive Learning as Goal-Conditioned Reinforcement Learning. Advances in Neural Information Processing Systems , Vol. 35 (Dec. 2022), 35603--35620. https://proceedings.neurips.cc/paper_files/paper/2022/hash/e7663e974c4ee7a2b475a4775201ce1f-Abstract-Conference.htmlGoogle ScholarGoogle Scholar
  17. Matthew C. Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Diversity. http://arxiv.org/abs/2106.03894 Number: arXiv:2106.03894 arXiv:2106.03894 [cs].Google ScholarGoogle Scholar
  18. Javier Garcia and Fernando Fernandez. [n.,d.]. A Comprehensive Survey on Safe Reinforcement Learning. ( [n.,d.]).Google ScholarGoogle Scholar
  19. Deepak Gopinath, Siddarth Jain, and Brenna D. Argall. 2017. Human-in-the-Loop Optimization of Shared Autonomy in Assistive Robotics. IEEE Robotics and Automation Letters, Vol. 2, 1 (Jan. 2017), 247--254. https://doi.org/10.1109/LRA.2016.2593928 Conference Name: IEEE Robotics and Automation Letters.Google ScholarGoogle ScholarCross RefCross Ref
  20. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv:1801.01290 [cs, stat] (Aug. 2018). http://arxiv.org/abs/1801.01290Google ScholarGoogle Scholar
  21. Shervin Javdani, Henny Admoni, Stefania Pellegrinelli, Siddhartha S. Srinivasa, and J. Andrew Bagnell. 2018. Shared autonomy via hindsight optimization for teleoperation and teaming. The International Journal of Robotics Research, Vol. 37, 7 (June 2018), 717--742. https://doi.org/10.1177/0278364918776060 Publisher: SAGE Publications Ltd STM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Shervin Javdani, Siddhartha Srinivasa, and Andrew Bagnell. 2015. Shared Autonomy via Hindsight Optimization. In Robotics: Science and Systems XI. Robotics: Science and Systems Foundation. https://doi.org/10.15607/RSS.2015.XI.032Google ScholarGoogle ScholarCross RefCross Ref
  23. Parham M. Kebria, Hamid Abdi, Mohsen Moradi Dalvand, Abbas Khosravi, and Saeid Nahavandi. 2019. Control Methods for Internet-Based Teleoperation Systems: A Review. IEEE Transactions on Human-Machine Systems, Vol. 49, 1 (Feb. 2019), 32--46. https://doi.org/10.1109/THMS.2018.2878815 Conference Name: IEEE Transactions on Human-Machine Systems.Google ScholarGoogle ScholarCross RefCross Ref
  24. David Kent, Carl Saldanha, and Sonia Chernova. 2017. A Comparison of Remote Robot Teleoperation Interfaces for General Object Manipulation. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction (HRI '17). Association for Computing Machinery, New York, NY, USA, 371--379. https://doi.org/10.1145/2909824.3020249Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: the TAMER framework. In Proceedings of the fifth international conference on Knowledge capture - K-CAP '09. ACM Press, Redondo Beach, California, USA, 9. https://doi.org/10.1145/1597735.1597738Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Koenig and A. Howard. 2004. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), Vol. 3. 2149--2154 vol.3. https://doi.org/10.1109/IROS.2004.1389727Google ScholarGoogle ScholarCross RefCross Ref
  27. Saurabh Kumar, Aviral Kumar, Sergey Levine, and Chelsea Finn. 2020. One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 8198--8210. https://proceedings.neurips.cc/paper/2020/hash/5d151d1059a6281335a10732fc49620e-Abstract.htmlGoogle ScholarGoogle Scholar
  28. Minghuan Liu, Menghui Zhu, and Weinan Zhang. 2022. Goal-Conditioned Reinforcement Learning: Problems and Solutions. http://arxiv.org/abs/2201.08299 arXiv:2201.08299 [cs].Google ScholarGoogle Scholar
  29. Björn Lütjens, Michael Everett, and Jonathan P. How. 2019. Safe Reinforcement Learning With Model Uncertainty Estimates. In 2019 International Conference on Robotics and Automation (ICRA). 8662--8668. https://doi.org/10.1109/ICRA.2019.8793611 ISSN: 2577-087X.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, and Li Fei-Fei. 2018. ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation. In Proceedings of The 2nd Conference on Robot Learning. PMLR, 879--893. https://proceedings.mlr.press/v87/mandlekar18a.html ISSN: 2640--3498.Google ScholarGoogle Scholar
  31. Gabriel B. Margolis and Pulkit Agrawal. 2022. Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior. http://arxiv.org/abs/2212.03238 arXiv:2212.03238 [cs, eess].Google ScholarGoogle Scholar
  32. Daniel Marta, Christian Pek, Gaspar I. Melsión, Jana Tumova, and Iolanda Leite. 2022. Human-Feedback Shield Synthesis for Perceived Safety in Deep Reinforcement Learning. IEEE Robotics and Automation Letters, Vol. 7, 1 (Jan. 2022), 406--413. https://doi.org/10.1109/LRA.2021.3128237 Conference Name: IEEE Robotics and Automation Letters.Google ScholarGoogle ScholarCross RefCross Ref
  33. Oier Mees, Markus Merklinger, Gabriel Kalweit, and Wolfram Burgard. 2020. Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video. In 2020 IEEE International Conference on Robotics and Automation (ICRA). 4188--4194. https://doi.org/10.1109/ICRA40945.2020.9196582 ISSN: 2577-087X.Google ScholarGoogle ScholarCross RefCross Ref
  34. Christopher Mower, Joao Moura, and Sethu Vijayakumar. 2021. Skill-based Shared Control. In Robotics: Science and Systems XVII. Robotics: Science and Systems Foundation. https://doi.org/10.15607/RSS.2021.XVII.028Google ScholarGoogle ScholarCross RefCross Ref
  35. Fabio Muratore, Felix Treede, Michael Gienger, and Jan Peters. 2018. Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment. In Proceedings of The 2nd Conference on Robot Learning. PMLR, 700--713. https://proceedings.mlr.press/v87/muratore18a.html ISSN: 2640--3498.Google ScholarGoogle Scholar
  36. Vivek Myers, Erdem Biyik, Nima Anari, and Dorsa Sadigh. 2022. Learning Multimodal Rewards from Rankings. In Proceedings of the 5th Conference on Robot Learning. PMLR, 342--352. https://proceedings.mlr.press/v164/myers22a.html ISSN: 2640--3498.Google ScholarGoogle Scholar
  37. Heramb Nemlekar, Neel Dhanaraj, Angelos Guan, Satyandra K. Gupta, and Stefanos Nikolaidis. 2023. Transfer Learning of Human Preferences for Proactive Robot Assistance in Assembly Tasks. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. ACM, Stockholm Sweden, 575--583. https://doi.org/10.1145/3568162.3576965Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Benjamin A. Newman, Reuben M. Aronson, Siddhartha S. Srinivasa, Kris Kitani, and Henny Admoni. 2022. HARMONIC: A multimodal dataset of assistive human--robot collaboration. The International Journal of Robotics Research, Vol. 41, 1 (Jan. 2022), 3--11. https://doi.org/10.1177/02783649211050677 Publisher: SAGE Publications Ltd STM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Takayuki Osa, Voot Tangkaratt, and Masashi Sugiyama. 2022. Discovering diverse solutions in deep reinforcement learning by maximizing state--action-based mutual information. Neural Networks , Vol. 152 (Aug. 2022), 90--104. https://doi.org/10.1016/j.neunet.2022.04.009Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Carolina Passenberg, Angelika Peer, and Martin Buss. 2010. A survey of environment-, operator-, and task-adapted controllers for teleoperation systems. Mechatronics, Vol. 20, 7 (Oct. 2010), 787--801. https://doi.org/10.1016/j.mechatronics.2010.04.005Google ScholarGoogle ScholarCross RefCross Ref
  41. Benjamin Pitzer, Michael Styer, Christian Bersch, Charles DuHadway, and Jan Becker. 2011. Towards perceptual shared autonomy for robotic mobile manipulation. In 2011 IEEE International Conference on Robotics and Automation. 6245--6251. https://doi.org/10.1109/ICRA.2011.5980259 ISSN: 1050--4729.Google ScholarGoogle ScholarCross RefCross Ref
  42. Justin K. Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality Diversity: A New Frontier for Evolutionary Computation. Frontiers in Robotics and AI , Vol. 3 (2016). https://www.frontiersin.org/article/10.3389/frobt.2016.00040Google ScholarGoogle Scholar
  43. Ellis Ratner, Dylan Hadfield-Menell, and Anca D. Dragan. 2018. Simplifying Reward Design through Divide-and-Conquer. http://arxiv.org/abs/1806.02501 Number: arXiv:1806.02501 arXiv:1806.02501 [cs].Google ScholarGoogle Scholar
  44. Siddharth Reddy, Anca D. Dragan, and Sergey Levine. 2018. Shared Autonomy via Deep Reinforcement Learning. http://arxiv.org/abs/1802.01744 Number: arXiv:1802.01744 arXiv:1802.01744 [cs].Google ScholarGoogle Scholar
  45. Mario Selvaggio, Marco Cognetti, Stefanos Nikolaidis, Serena Ivaldi, and Bruno Siciliano. 2021. Autonomy in Physical Human-Robot Interaction: A Brief Survey. IEEE Robotics and Automation Letters, Vol. 6, 4 (Oct. 2021), 7989--7996. https://doi.org/10.1109/LRA.2021.3100603 Conference Name: IEEE Robotics and Automation Letters.Google ScholarGoogle ScholarCross RefCross Ref
  46. Isaac Sheidlower, Allison Moore, and Elaine Short. 2022. Keeping Humans in the Loop: Teaching via Feedback in Continuous Action Space Environments. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 863--870. https://doi.org/10.1109/IROS47612.2022.9982282 ISSN: 2153-0866.Google ScholarGoogle ScholarCross RefCross Ref
  47. Bryon Tjanaka, Matthew C. Fontaine, Julian Togelius, and Stefanos Nikolaidis. 2022. Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning. http://arxiv.org/abs/2202.03666 Number: arXiv:2202.03666 arXiv:2202.03666 [cs].Google ScholarGoogle Scholar
  48. Bryon Tjanaka, Matthew C. Fontaine, Yulun Zhang, Sam Sommerer, Nathan Dennler, and Stefanos Nikolaidis. 2021. pyribs: A bare-bones Python library for quality diversity optimization. https://github.com/icaros-usc/pyribs Publication Title: GitHub repository.Google ScholarGoogle Scholar
  49. Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. https://doi.org/10.48550/arXiv.1703.06907 arXiv:1703.06907 [cs].Google ScholarGoogle ScholarCross RefCross Ref
  50. Johnny van Doorn, Don van den Bergh, Udo Böhm, Fabian Dablander, Koen Derks, Tim Draws, Alexander Etz, Nathan J. Evans, Quentin F. Gronau, Julia M. Haaf, Max Hinne, Simon Kucharský, Alexander Ly, Maarten Marsman, Dora Matzke, Akash R. Komarlu Narendra Gupta, Alexandra Sarafoglou, Angelika Stefan, Jan G. Voelkel, and Eric-Jan Wagenmakers. 2021. The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, Vol. 28, 3 (June 2021), 813--826. https://doi.org/10.3758/s13423-020-01798--5Google ScholarGoogle ScholarCross RefCross Ref
  51. Sanne van Waveren, Christian Pek, Jana Tumova, and Iolanda Leite. 2022. Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning Policies. In Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction (HRI '22). IEEE Press, Sapporo, Hokkaido, Japan, 493--501.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Viswanath Venkatesh, Michael G. Morris, Gordon B. Davis, and Fred D. Davis. 2003. User Acceptance of Information Technology: Toward a Unified View. MIS Quarterly, Vol. 27, 3 (2003), 425--478. https://doi.org/10.2307/30036540 Publisher: Management Information Systems Research Center, University of Minnesota.Google ScholarGoogle ScholarCross RefCross Ref
  53. Nick Walker, Kevin Weatherwax, Julian Allchin, Leila Takayama, and Maya Cakmak. 2020. Human Perceptions of a Curious Robot that Performs Off-Task Actions. In Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction. ACM, Cambridge United Kingdom, 529--538. https://doi.org/10.1145/3319502.3374821Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xingjiao Wu, Luwei Xiao, Yixuan Sun, Junhang Zhang, Tianlong Ma, and Liang He. 2022. A survey of human-in-the-loop for machine learning. Future Generation Computer Systems , Vol. 135 (Oct. 2022), 364--381. https://doi.org/10.1016/j.future.2022.05.014Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Yang Xing, Chen Lv, Dongpu Cao, and Peng Hang. 2021. Toward human-vehicle collaboration: Review and perspectives on human-centered collaborative automated driving. Transportation Research Part C: Emerging Technologies , Vol. 128 (July 2021), 103199. https://doi.org/10.1016/j.trc.2021.103199Google ScholarGoogle ScholarCross RefCross Ref
  56. Matthew Zurek, Andreea Bobu, Daniel S. Brown, and Anca D. Dragan. 2021. Situational Confidence Assistance for Lifelong Shared Autonomy. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE Press, Xián, China, 2783--2789. https://doi.org/10.1109/ICRA48506.2021.9561839 ioGoogle ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online Behavior Modification for Expressive User Control of RL-Trained Robots

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          HRI '24: Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction
          March 2024
          982 pages
          ISBN:9798400703225
          DOI:10.1145/3610977

          Copyright © 2024 Owner/Author

          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 March 2024

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate242of1,000submissions,24%
        • Article Metrics

          • Downloads (Last 12 months)108
          • Downloads (Last 6 weeks)80

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader