ABSTRACT
Reinforcement Learning (RL) is an effective method for robots to learn tasks. However, in typical RL, end-users have little to no control over how the robot does the task after the robot has been deployed. To address this, we introduce the idea of online behavior modification, a paradigm in which users have control over behavior features of a robot in real-time as it autonomously completes a task using an RL-trained policy. To show the value of this user-centered formulation for human-robot interaction, we present a behavior-diversity--based algorithm, Adjustable Control Of RL Dynamics (ACORD), and demonstrate its applicability to online behavior modification in simulation and a user study. In the study (n =23), users adjust the style of paintings as a robot traces a shape autonomously. We compare \algoshort to RL and Shared Autonomy (SA), and show \algoshort affords user-preferred levels of control and expression, comparable to SA, but with the potential for autonomous execution and robustness of RL. The code for this paper is available at https://github.com/AABL-Lab/HRI2024_ACORD
Supplemental Material
Available for Download
The appendix folder contains the appendix/suplimentary material referenced in the paper. This includes hyperparameters and an abalition study, as well as images of all the paintings from the user study. The code folder contains python code and is the same code available at the URL specified in the abstract of the paper. The video folder contains a short video depicting robot execution of each of the three conditions presented in the paper.
- Mohammed Alshiekh, Roderick Bloem, Ruediger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2017. Safe Reinforcement Learning via Shielding. https://doi.org/10.48550/arXiv.1708.08611 arXiv:1708.08611 [cs].Google ScholarCross Ref
- Eitan Altman. 2021. Constrained Markov Decision Processes: Stochastic Modeling 1 ed.). Routledge, Boca Raton. https://doi.org/10.1201/9781315140223Google ScholarCross Ref
- Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. 2018. Hindsight Experience Replay. http://arxiv.org/abs/1707.01495 arXiv:1707.01495 [cs].Google Scholar
- Christian Arzate Cruz and Takeo Igarashi. 2020. A Survey on Interactive Reinforcement Learning: Design Principles and Open Challenges. In Proceedings of the 2020 ACM Designing Interactive Systems Conference. ACM, Eindhoven Netherlands, 1195--1209. https://doi.org/10.1145/3357236.3395525Google ScholarDigital Library
- Chandrayee Basu, Mukesh Singhal, and Anca D. Dragan. 2018. Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction. 132--140. https://doi.org/10.1145/3171221.3171284 arXiv:1802.01604 [cs].Google ScholarDigital Library
- Erdem Biyik, Nicolas Huynh, Mykel Kochenderfer, and Dorsa Sadigh. 2020. Active Preference-Based Gaussian Process Regression for Reward Learning. In Robotics: Science and Systems XVI. Robotics: Science and Systems Foundation. https://doi.org/10.15607/RSS.2020.XVI.041Google ScholarCross Ref
- Andreea Bobu, Marius Wiggert, Claire Tomlin, and Anca D. Dragan. 2021. Feature Expansive Reward Learning: Rethinking Human Input. Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction (March 2021), 216--224. https://doi.org/10.1145/3434073.3444667 arXiv: 2006.13208.Google ScholarDigital Library
- Jake Brawer, Debasmita Ghose, Kate Candon, Meiying Qin, Alessandro Roncone, Marynel Vázquez, and Brian Scassellati. 2023. Interactive Policy Shaping for Human-Robot Collaboration with Transparent Matrix Overlays. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. ACM, Stockholm Sweden, 525--533. https://doi.org/10.1145/3568162.3576983Google ScholarDigital Library
- Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. http://arxiv.org/abs/1606.01540 arXiv:1606.01540 [cs].Google Scholar
- Erdem Biyik, Dylan P. Losey, Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, and Dorsa Sadigh. 2022. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research, Vol. 41, 1 (Jan. 2022), 45--67. https://doi.org/10.1177/02783649211041652 Publisher: SAGE Publications Ltd STM.Google ScholarDigital Library
- Elliot Chane-Sane, Cordelia Schmid, and Ivan Laptev. 2021. Goal-Conditioned Reinforcement Learning with Imagined Subgoals. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 1430--1440. https://proceedings.mlr.press/v139/chane-sane21a.html ISSN: 2640--3498.Google Scholar
- Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. http://arxiv.org/abs/1706.03741 Number: arXiv:1706.03741 arXiv:1706.03741 [cs, stat].Google Scholar
- Matei Ciocarlie, Kaijen Hsiao, Adam Leeper, and David Gossow. 2012. Mobile manipulation through an assistive home robot. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 5313--5320. https://doi.org/10.1109/IROS.2012.6385907 ISSN: 2153-0866.Google ScholarCross Ref
- Antoine Cully. 2019. Autonomous skill discovery with quality-diversity and unsupervised descriptors. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Prague Czech Republic, 81--89. https://doi.org/10.1145/3321707.3321804Google ScholarDigital Library
- Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is All You Need: Learning Skills without a Reward Function. http://arxiv.org/abs/1802.06070 Number: arXiv:1802.06070 arXiv:1802.06070 [cs].Google Scholar
- Benjamin Eysenbach, Tianjun Zhang, Sergey Levine, and Russ R. Salakhutdinov. 2022. Contrastive Learning as Goal-Conditioned Reinforcement Learning. Advances in Neural Information Processing Systems , Vol. 35 (Dec. 2022), 35603--35620. https://proceedings.neurips.cc/paper_files/paper/2022/hash/e7663e974c4ee7a2b475a4775201ce1f-Abstract-Conference.htmlGoogle Scholar
- Matthew C. Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Diversity. http://arxiv.org/abs/2106.03894 Number: arXiv:2106.03894 arXiv:2106.03894 [cs].Google Scholar
- Javier Garcia and Fernando Fernandez. [n.,d.]. A Comprehensive Survey on Safe Reinforcement Learning. ( [n.,d.]).Google Scholar
- Deepak Gopinath, Siddarth Jain, and Brenna D. Argall. 2017. Human-in-the-Loop Optimization of Shared Autonomy in Assistive Robotics. IEEE Robotics and Automation Letters, Vol. 2, 1 (Jan. 2017), 247--254. https://doi.org/10.1109/LRA.2016.2593928 Conference Name: IEEE Robotics and Automation Letters.Google ScholarCross Ref
- Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv:1801.01290 [cs, stat] (Aug. 2018). http://arxiv.org/abs/1801.01290Google Scholar
- Shervin Javdani, Henny Admoni, Stefania Pellegrinelli, Siddhartha S. Srinivasa, and J. Andrew Bagnell. 2018. Shared autonomy via hindsight optimization for teleoperation and teaming. The International Journal of Robotics Research, Vol. 37, 7 (June 2018), 717--742. https://doi.org/10.1177/0278364918776060 Publisher: SAGE Publications Ltd STM.Google ScholarDigital Library
- Shervin Javdani, Siddhartha Srinivasa, and Andrew Bagnell. 2015. Shared Autonomy via Hindsight Optimization. In Robotics: Science and Systems XI. Robotics: Science and Systems Foundation. https://doi.org/10.15607/RSS.2015.XI.032Google ScholarCross Ref
- Parham M. Kebria, Hamid Abdi, Mohsen Moradi Dalvand, Abbas Khosravi, and Saeid Nahavandi. 2019. Control Methods for Internet-Based Teleoperation Systems: A Review. IEEE Transactions on Human-Machine Systems, Vol. 49, 1 (Feb. 2019), 32--46. https://doi.org/10.1109/THMS.2018.2878815 Conference Name: IEEE Transactions on Human-Machine Systems.Google ScholarCross Ref
- David Kent, Carl Saldanha, and Sonia Chernova. 2017. A Comparison of Remote Robot Teleoperation Interfaces for General Object Manipulation. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction (HRI '17). Association for Computing Machinery, New York, NY, USA, 371--379. https://doi.org/10.1145/2909824.3020249Google ScholarDigital Library
- W. Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: the TAMER framework. In Proceedings of the fifth international conference on Knowledge capture - K-CAP '09. ACM Press, Redondo Beach, California, USA, 9. https://doi.org/10.1145/1597735.1597738Google ScholarDigital Library
- N. Koenig and A. Howard. 2004. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), Vol. 3. 2149--2154 vol.3. https://doi.org/10.1109/IROS.2004.1389727Google ScholarCross Ref
- Saurabh Kumar, Aviral Kumar, Sergey Levine, and Chelsea Finn. 2020. One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 8198--8210. https://proceedings.neurips.cc/paper/2020/hash/5d151d1059a6281335a10732fc49620e-Abstract.htmlGoogle Scholar
- Minghuan Liu, Menghui Zhu, and Weinan Zhang. 2022. Goal-Conditioned Reinforcement Learning: Problems and Solutions. http://arxiv.org/abs/2201.08299 arXiv:2201.08299 [cs].Google Scholar
- Björn Lütjens, Michael Everett, and Jonathan P. How. 2019. Safe Reinforcement Learning With Model Uncertainty Estimates. In 2019 International Conference on Robotics and Automation (ICRA). 8662--8668. https://doi.org/10.1109/ICRA.2019.8793611 ISSN: 2577-087X.Google ScholarDigital Library
- Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, and Li Fei-Fei. 2018. ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation. In Proceedings of The 2nd Conference on Robot Learning. PMLR, 879--893. https://proceedings.mlr.press/v87/mandlekar18a.html ISSN: 2640--3498.Google Scholar
- Gabriel B. Margolis and Pulkit Agrawal. 2022. Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior. http://arxiv.org/abs/2212.03238 arXiv:2212.03238 [cs, eess].Google Scholar
- Daniel Marta, Christian Pek, Gaspar I. Melsión, Jana Tumova, and Iolanda Leite. 2022. Human-Feedback Shield Synthesis for Perceived Safety in Deep Reinforcement Learning. IEEE Robotics and Automation Letters, Vol. 7, 1 (Jan. 2022), 406--413. https://doi.org/10.1109/LRA.2021.3128237 Conference Name: IEEE Robotics and Automation Letters.Google ScholarCross Ref
- Oier Mees, Markus Merklinger, Gabriel Kalweit, and Wolfram Burgard. 2020. Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video. In 2020 IEEE International Conference on Robotics and Automation (ICRA). 4188--4194. https://doi.org/10.1109/ICRA40945.2020.9196582 ISSN: 2577-087X.Google ScholarCross Ref
- Christopher Mower, Joao Moura, and Sethu Vijayakumar. 2021. Skill-based Shared Control. In Robotics: Science and Systems XVII. Robotics: Science and Systems Foundation. https://doi.org/10.15607/RSS.2021.XVII.028Google ScholarCross Ref
- Fabio Muratore, Felix Treede, Michael Gienger, and Jan Peters. 2018. Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment. In Proceedings of The 2nd Conference on Robot Learning. PMLR, 700--713. https://proceedings.mlr.press/v87/muratore18a.html ISSN: 2640--3498.Google Scholar
- Vivek Myers, Erdem Biyik, Nima Anari, and Dorsa Sadigh. 2022. Learning Multimodal Rewards from Rankings. In Proceedings of the 5th Conference on Robot Learning. PMLR, 342--352. https://proceedings.mlr.press/v164/myers22a.html ISSN: 2640--3498.Google Scholar
- Heramb Nemlekar, Neel Dhanaraj, Angelos Guan, Satyandra K. Gupta, and Stefanos Nikolaidis. 2023. Transfer Learning of Human Preferences for Proactive Robot Assistance in Assembly Tasks. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. ACM, Stockholm Sweden, 575--583. https://doi.org/10.1145/3568162.3576965Google ScholarDigital Library
- Benjamin A. Newman, Reuben M. Aronson, Siddhartha S. Srinivasa, Kris Kitani, and Henny Admoni. 2022. HARMONIC: A multimodal dataset of assistive human--robot collaboration. The International Journal of Robotics Research, Vol. 41, 1 (Jan. 2022), 3--11. https://doi.org/10.1177/02783649211050677 Publisher: SAGE Publications Ltd STM.Google ScholarDigital Library
- Takayuki Osa, Voot Tangkaratt, and Masashi Sugiyama. 2022. Discovering diverse solutions in deep reinforcement learning by maximizing state--action-based mutual information. Neural Networks , Vol. 152 (Aug. 2022), 90--104. https://doi.org/10.1016/j.neunet.2022.04.009Google ScholarDigital Library
- Carolina Passenberg, Angelika Peer, and Martin Buss. 2010. A survey of environment-, operator-, and task-adapted controllers for teleoperation systems. Mechatronics, Vol. 20, 7 (Oct. 2010), 787--801. https://doi.org/10.1016/j.mechatronics.2010.04.005Google ScholarCross Ref
- Benjamin Pitzer, Michael Styer, Christian Bersch, Charles DuHadway, and Jan Becker. 2011. Towards perceptual shared autonomy for robotic mobile manipulation. In 2011 IEEE International Conference on Robotics and Automation. 6245--6251. https://doi.org/10.1109/ICRA.2011.5980259 ISSN: 1050--4729.Google ScholarCross Ref
- Justin K. Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality Diversity: A New Frontier for Evolutionary Computation. Frontiers in Robotics and AI , Vol. 3 (2016). https://www.frontiersin.org/article/10.3389/frobt.2016.00040Google Scholar
- Ellis Ratner, Dylan Hadfield-Menell, and Anca D. Dragan. 2018. Simplifying Reward Design through Divide-and-Conquer. http://arxiv.org/abs/1806.02501 Number: arXiv:1806.02501 arXiv:1806.02501 [cs].Google Scholar
- Siddharth Reddy, Anca D. Dragan, and Sergey Levine. 2018. Shared Autonomy via Deep Reinforcement Learning. http://arxiv.org/abs/1802.01744 Number: arXiv:1802.01744 arXiv:1802.01744 [cs].Google Scholar
- Mario Selvaggio, Marco Cognetti, Stefanos Nikolaidis, Serena Ivaldi, and Bruno Siciliano. 2021. Autonomy in Physical Human-Robot Interaction: A Brief Survey. IEEE Robotics and Automation Letters, Vol. 6, 4 (Oct. 2021), 7989--7996. https://doi.org/10.1109/LRA.2021.3100603 Conference Name: IEEE Robotics and Automation Letters.Google ScholarCross Ref
- Isaac Sheidlower, Allison Moore, and Elaine Short. 2022. Keeping Humans in the Loop: Teaching via Feedback in Continuous Action Space Environments. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 863--870. https://doi.org/10.1109/IROS47612.2022.9982282 ISSN: 2153-0866.Google ScholarCross Ref
- Bryon Tjanaka, Matthew C. Fontaine, Julian Togelius, and Stefanos Nikolaidis. 2022. Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning. http://arxiv.org/abs/2202.03666 Number: arXiv:2202.03666 arXiv:2202.03666 [cs].Google Scholar
- Bryon Tjanaka, Matthew C. Fontaine, Yulun Zhang, Sam Sommerer, Nathan Dennler, and Stefanos Nikolaidis. 2021. pyribs: A bare-bones Python library for quality diversity optimization. https://github.com/icaros-usc/pyribs Publication Title: GitHub repository.Google Scholar
- Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. https://doi.org/10.48550/arXiv.1703.06907 arXiv:1703.06907 [cs].Google ScholarCross Ref
- Johnny van Doorn, Don van den Bergh, Udo Böhm, Fabian Dablander, Koen Derks, Tim Draws, Alexander Etz, Nathan J. Evans, Quentin F. Gronau, Julia M. Haaf, Max Hinne, Simon Kucharský, Alexander Ly, Maarten Marsman, Dora Matzke, Akash R. Komarlu Narendra Gupta, Alexandra Sarafoglou, Angelika Stefan, Jan G. Voelkel, and Eric-Jan Wagenmakers. 2021. The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, Vol. 28, 3 (June 2021), 813--826. https://doi.org/10.3758/s13423-020-01798--5Google ScholarCross Ref
- Sanne van Waveren, Christian Pek, Jana Tumova, and Iolanda Leite. 2022. Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning Policies. In Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction (HRI '22). IEEE Press, Sapporo, Hokkaido, Japan, 493--501.Google ScholarDigital Library
- Viswanath Venkatesh, Michael G. Morris, Gordon B. Davis, and Fred D. Davis. 2003. User Acceptance of Information Technology: Toward a Unified View. MIS Quarterly, Vol. 27, 3 (2003), 425--478. https://doi.org/10.2307/30036540 Publisher: Management Information Systems Research Center, University of Minnesota.Google ScholarCross Ref
- Nick Walker, Kevin Weatherwax, Julian Allchin, Leila Takayama, and Maya Cakmak. 2020. Human Perceptions of a Curious Robot that Performs Off-Task Actions. In Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction. ACM, Cambridge United Kingdom, 529--538. https://doi.org/10.1145/3319502.3374821Google ScholarDigital Library
- Xingjiao Wu, Luwei Xiao, Yixuan Sun, Junhang Zhang, Tianlong Ma, and Liang He. 2022. A survey of human-in-the-loop for machine learning. Future Generation Computer Systems , Vol. 135 (Oct. 2022), 364--381. https://doi.org/10.1016/j.future.2022.05.014Google ScholarDigital Library
- Yang Xing, Chen Lv, Dongpu Cao, and Peng Hang. 2021. Toward human-vehicle collaboration: Review and perspectives on human-centered collaborative automated driving. Transportation Research Part C: Emerging Technologies , Vol. 128 (July 2021), 103199. https://doi.org/10.1016/j.trc.2021.103199Google ScholarCross Ref
- Matthew Zurek, Andreea Bobu, Daniel S. Brown, and Anca D. Dragan. 2021. Situational Confidence Assistance for Lifelong Shared Autonomy. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE Press, Xián, China, 2783--2789. https://doi.org/10.1109/ICRA48506.2021.9561839 ioGoogle ScholarDigital Library
Index Terms
- Online Behavior Modification for Expressive User Control of RL-Trained Robots
Recommendations
Impacts of Robot Learning on User Attitude and Behavior
HRI '23: Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot InteractionWith an aging population and a growing shortage of caregivers, the need for in-home robots is increasing. However, it is intractable for robots to have all functionalities pre-programmed prior to deployment. Instead, it is more realistic for robots to ...
Supervised autonomy for online learning in human-robot interaction
When a robot is learning it needs to explore its environment and how its environment responds on its actions. When the environment is large and there are a large number of possible actions the robot can take, this exploration phase can take ...
Supervisory control of multiple social robots for navigation
HRI '13: Proceedings of the 8th ACM/IEEE international conference on Human-robot interactionThis paper presents a human study and system implementation for the supervisory control of multiple social robots for navigational tasks. We studied the acceptable range of speed for robots interacting with people through navigation, and we discovered ...
Comments