skip to main content
10.1145/3568294.3579971acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
extended-abstract

Aligning Robot Behaviors with Human Intents by Exposing Learned Behaviors and Resolving Misspecifications

Published: 13 March 2023 Publication History

Abstract

Human-robot interaction is limited in large part by the challenge of writing correct specifications for robots. The research community wants alignment between humans' goals and robot behaviors, but this alignment is very hard to achieve. My research tackles this problem. I view alignment as the consequence of iterative design and ample testing, and I design methods in service of these processes. I first study how humans currently write reward functions, and I profile some of the typical errors they make when doing so. I then study how humans can inspect the behaviors robots learn from any given specification. A typical approach to this mandates unstructured or hand-designed test cases; I instead introduce a Bayesian inference method for finding behavior examples which cover information-rich test cases. Alongside finding these behavior examples, I study how these examples should be presented to the human through applying cognitive theories of human concept learning. For the remainder of my thesis, I am pursuing two open questions. My first question concerns how these components can be combined such that humans are able to iteratively design better behavioral specifications. My second question concerns how robots can better interpret humans' erroneous specifications and attempt to infer their true intent, in spite of the errors.

References

[1]
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schul- man, and Dan Mané. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
[2]
Serena Booth, Bradley W. Knox, Julie Shah, Scott Niekum, Peter Stone, and Alessandro Allievi. 2023. The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications. AAAI Conference on Artificial Intelligence.
[3]
Serena Booth, Sanjana Sharma, Sarah Chung, Julie Shah, and Elena L. Glassman. 2022. Revisiting human-robot teaching and learning through the lens of human concept learning. Proceedings of the Human-Robot Interaction Conference (HRI).
[4]
Serena Booth, Yilun Zhou, Ankit Shah, and Julie Shah. 2021. Bayes-TrEx: a Bayesian sampling approach to model transparency by example. AAAI Conference on Artificial Intelligence.
[5]
Dedre Gentner and Linsey A Smith. 2013. Analogical learning and reasoning. The Oxford handbook of cognitive psychology, 668--681.
[6]
Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J Russell, and Anca Dragan. 2017. Inverse reward design. Advances in neural information processing systems, 30.
[7]
Jerry Zhi-Yang He and Anca D Dragan. 2021. Assisted robust reward design. Conference on Robot Learning (CoRL).
[8]
W Bradley Knox, Alessandro Allievi, Holger Banzhaf, Felix Schmitt, and Peter Stone. 2021. Reward (mis) design for autonomous driving. arXiv preprint arXiv:2104.13906.
[9]
W Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, and Alessandro Allievi. 2022. Models of human preference for learning reward functions. arXiv preprint arXiv:2206.02231.
[10]
Ference Marton. 2014. Necessary conditions of learning. Routledge.
[11]
David Silver, Satinder Singh, Doina Precup, and Richard S Sutton. 2021. Reward is enough. Artificial Intelligence, 299, 103535.
[12]
Yilun Zhou, Serena Booth, Nadia Figueroa, and Julie Shah. 2021. RoCUS: robot controller understanding via sampling. Conference on Robot Learning.
[13]
Yilun Zhou, Serena Booth, Marco Tulio Ribeiro, and Julie Shah. 2022. Do feature attribution methods correctly attribute features? AAAI Conference on Artificial Intelligence.

Cited By

View all

Index Terms

  1. Aligning Robot Behaviors with Human Intents by Exposing Learned Behaviors and Resolving Misspecifications

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      HRI '23: Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction
      March 2023
      612 pages
      ISBN:9781450399708
      DOI:10.1145/3568294
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 March 2023

      Check for updates

      Author Tags

      1. explainable ai
      2. human-robot interaction
      3. reward design

      Qualifiers

      • Extended-abstract

      Funding Sources

      • NSF

      Conference

      HRI '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 268 of 1,124 submissions, 24%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 168
        Total Downloads
      • Downloads (Last 12 months)45
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media