Planning for Human–Robot Interaction in Socially Situated Tasks

Broz, Frank; Nourbakhsh, Illah; Simmons, Reid

doi:10.1007/s12369-013-0185-z

Planning for Human–Robot Interaction in Socially Situated Tasks

The Impact of Representing Time and Intention

Published: 12 April 2013

Volume 5, pages 193–214, (2013)
Cite this article

International Journal of Social Robotics Aims and scope Submit manuscript

Frank Broz¹,
Illah Nourbakhsh² &
Reid Simmons²

1427 Accesses
26 Citations
Explore all metrics

Abstract

This article presents the results of a study on the effects of representing time and intention in models of socially situated tasks on the quality of policies for robot behavior. The ability to reason about how others’ observable actions relate to their unobservable intentions is an important part of interpreting and responding to social behavior. It is also often necessary to observe the timing of actions in order to disambiguate others’ intentions. Therefore, our proposed approach is to model these interactions as time-indexed partially observable Markov decision processes (POMDPs). The intentions of humans are represented as hidden state in the POMDP models, and the time-dependence of actions by both humans and the robot are explicitly modelled. Our hypothesis is that planning for these interactions with a model that represents time dependent action outcomes and uncertainty about others’ intentions will achieve better results than simpler models that make fixed assumptions about people’s intentionality or abstract away time-dependent effects. A driving interaction governed by social conventions and involving ambiguity in the other driver’s intent was used as the scenario with which to test this hypothesis. A robot car controlled by policies from time-dependent POMDP models or by policies from two less expressive model variants performed this interaction in a driving simulator with human drivers. The time-dependent POMDP policies achieved better results than those of the models without explicit time representation or human intention as hidden state, both according to the reward obtained and to people’s subjective impressions of how socially appropriate and natural the robot’s behavior was. These results demonstrate both the relative superiority of these representation choices and the effectiveness of this approach to planning for socially situated tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decision-Theoretic Planning with Person Trajectory Prediction for Social Navigation

Intention-Aware Motion Planning

An Adaptive and Proactive Human-Aware Robot Guide

References

Abele S, Bless H, Ehrhart KM (2004) Social information processing in strategic decision-making: why timing matters. Organ Behav Hum Decis Process 93(1):28–46
Article Google Scholar
Aumann RJ (1976) Agreeing to disagree. Ann Stat 4:1236–1239
Article MathSciNet MATH Google Scholar
Baldwin DA, Baird JA (2001) Discerning intentions in dynamic human action. Trends Cogn Sci 5(4):171–178
Article Google Scholar
Bosch Lt, Oostdijk N, Ruiter JPd (2004) Durational aspects of turn-taking in spontaneous face-to-face and telephone dialogues. In: Proc of text, speech, and dialogue, 7th international conference (TSD), pp 563–570
Chapter Google Scholar
Bratman ME (1990) What is intention? In: Cohen PR, Morgan J, Pollack ME (eds) Intentions in communication. MIT Press, Cambridge, pp 15–31
Google Scholar
Broz F (2008) Planning for human–robot interaction: representing time and human intention. Tech. Rep. CMU-RI-TR-08-49, Carnegie Mellon U. Robotics Institute
Broz F, Nourbakhsh IR, Simmons RG (2008) Planning for human–robot interaction using time-state aggregated pomdps. In: Proc of the 23rd AAAI conference on artificial intelligence (AAAI), pp 1339–1344
Google Scholar
Broz F, Nourbakhsh IR, Simmons RG (2011) Designing pomdp models of socially situated tasks. In: Proc of the 20th IEEE international symposium on robot and human interactive communication (Ro-man)
Google Scholar
Chernova S, DePalma N, Breazeal C (2011) Crowdsourcing real world human-robot dialog and teamwork through online multiplayer games. AI Mag 32(4):100–111
Google Scholar
Cirillo M, Karlsson L, Saffiotti A (2009) A human-aware robot task planner. In: Proc of the 11th international conference on automated planning and scheduling (ICAPS), Thessaloniki, Greece
Google Scholar
Colman AM (2003) Cooperation, psychological game theory, and limitations of rationality in social interaction. Behav Brain Sci 26:139–153
Google Scholar
Emery-Montemerlo R, Gordon G, Schneider J, Thrun S (2004) Approximate solutions for partially observable stochastic games with common payoffs. In: Proc of the 3rd international joint conference on autonomous agents and multi-agent systems (AAMAS)
Google Scholar
Foka A, Trahanias P (2010) Probabilistic autonomous robot navigation in dynamic environments with human motion prediction. Int J Soc Robot 2:79–94
Article Google Scholar
Frith U, Frith C (2001) The biological basis of social interaction. Curr Dir Psychol Sci 10(5):151–155
Article Google Scholar
Geanakoplos J (1992) Common knowledge. In: Proc of the 4th conference on theoretical aspects of reasoning about knowledge (TARK), pp 254–315
Google Scholar
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134
Article MathSciNet MATH Google Scholar
Kirk RE (1982) Experimental design, 2nd edn. Brooks/Cole, Pacific Grove
Google Scholar
Lewis DK (1969) Convention: a philosophical study. Harvard University Press, Cambridge
Google Scholar
Maynard Smith J (1998) The origin of altruism. Nature 393:639
Article Google Scholar
Meltzoff AN (1995) Understanding the intentions of others: re-enactment of intended actions by 18-month-old children. Dev Psychol 31(5):838–850
Article Google Scholar
Mui L, Mohtashemi M, Halberstadt A (2002) Notions of reputation in multi-agents systems: a review. In: Proc of the 1st international joint conference on autonomous agents and multiagent systems (AAMAS), pp 280–287
Google Scholar
Nourbakhsh I, Powers R, Birchfield S (1995) Dervish an office navigating robot. AI Mag 16(2):53–60
Google Scholar
Pelphrey KA, Morris JP, Mccarthy G (2004) Grasping the intentions of others: the perceived intentionality of an action influences activity in the superior temporal sulcus during social perception. J Cogn Neurosci 16(10):1706–1716
Article Google Scholar
Pineau J, Gordon G, Thrun S (2003) Point-based value iteration: an anytime algorithm for pomdps. In: Proc of the 18th international joint conference on artificial intelligence (IJCAI)
Google Scholar
Poupart P, Boutilier C (2004) VDCBPI: an approximate scalable algorithm for large scale pomdps. In: Proc of advances in neural information processing systems (NIPS), vol 17, pp 1081–1088
Google Scholar
Puterman LM (1994) Markov decision processes. Wiley, New York
Book MATH Google Scholar
Schultz J, Imamizu H, Kawato M, Frith CD (2004) Activation of the human superior temporal gyrus during observation of goal attribution by intentional objects. J Cogn Neurosci 16(10):1695–1705
Article Google Scholar
Sebanz N, Bekkering H, Knoblich G (2006) Joint action: bodies and minds moving together. Trends Cogn Sci 10(2):70–76
Article Google Scholar
Simmons R, Koenig S (1995) Probabilistic robot navigation in partially observable environments. In: Proc of the 14th international joint conference on artificial intelligence, pp 1080–1087
Google Scholar
Smith T (2007) Probabilistic planning for robotic exploration. PhD thesis, The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA
Smith T (2007) ZMDP software for POMDP and MDP planning. http://www.contrib.andrew.cmu.edu/trey/zmdp/
Smith T, Simmons R (2004) Heuristic search value iteration for pomdps. In: Proc of the 20th conference on uncertainty in artificial intelligence (UAI)
Google Scholar
Smith T, Simmons RG (2006) Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic. In: Proc of the 21st national conference on artificial intelligence (AAAI)
Google Scholar
Striano T, Henning A, Stahl D (2006) Sensitivity to interpersonal timing at 3 and 6 months of age. Interact Stud 7(2):251–271
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054
Article Google Scholar
Tipaldi GD, Arras K (2011) Planning problems for social robots. In: Proc of the 21st international conference on automated planning and scheduling (ICAPS)
Google Scholar
Tomasello M, Carpenter M, Call J, Behne T, Moll H (2005) Understanding and sharing intentions: the origins of social cognition. Behav Brain Sci 28:675–735
Google Scholar
TORCS (2013) The open racing car simulator. http://torcs.sourceforge.net/
Walter H, Adenzato M, Ciaramidaro A, Enrici I, Pia L, Bara BG (2004) Understanding intentions in social interaction: the role of the anterior paracingulate cortex. J Cogn Neurosci 16(10):1854–1863
Article Google Scholar
Wereschagin M (2006) Pittsburgh left seen by many as a local right. Pittsburgh Tribune-Review, 14 June 2006

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, Plymouth University, Plymouth, PL4 8AA, UK
Frank Broz
Robotics Institute, Carnegie Mellon University, Pittsburgh, 15213, USA
Illah Nourbakhsh & Reid Simmons

Authors

Frank Broz
View author publications
You can also search for this author in PubMed Google Scholar
Illah Nourbakhsh
View author publications
You can also search for this author in PubMed Google Scholar
Reid Simmons
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank Broz.

Appendix: Model Description

The overall structure of the domain description file is given in Table 16. The state space is time-indexed, so the number of timesteps in an episode must be specified. The time index is represented by a special state variable named “Time” that always has the value of the current timestep. This variable may be referred to in the preconditions of action or reward rules but may not have its value changed by an action rule. The rest of the state space is defined by a list of state variable names with the corresponding range of integer values that each variable make take.

Table 16 Structure of the model description file

Full size table

The action space is defined in the same manner. Additionally, three special action variables are defined, “Human,” “Env,” and “Side Effects”. These are used to specify the action rules that describe the actions of the human agent and the environment. Because these actions are automatically taken in response to the actions of the robot at each timestep rather than being chosen by the robot, their range of action values are undefined.

1.1 A.1 States and Observations

The state space for the POMDP models is defined as follows:

human intention (2)—go first, yield
robot position (5)—the region occupied by the robot
human position (5)—the region occupied by the human
robot angle (2)—whether car body is turned more than 45 degrees
robot speed (3)—stopped, rolling/creeping, fast
human speed (3)—stopped, rolling/creeping, fast
human headlights (2)—whether the human has flashed their headlights
collision (3)—no collision, collision at this timestep, collision has occurred
traffic light (3)—the color of the traffic light
time (60)—the current timestep

The intersection and the surrounding roads are partitioned into rectangular regions relative to each agent’s starting position that are significant to the interaction. These regions are: more than a car length from the start of the intersection, a car length from the start of the intersection, the half of the intersection closest to its start, the half of the intersection closest to its end, and the road on the other side of the intersection. There is an additional position value that corresponds to the event of an agent entering the region of the road beyond the intersection for the first timestep. This value is used to assign the one-time reward for an agent crossing the intersection.

Observations are created by defining a subset of the state variables that are directly observable. The observable state variables for this domain are as follows:

robot position
human position
human headlights
human speed
collision
traffic light

Because the robot’s current speed and angle should always be known based on the effects of the set speed and turn actions, these variables are not included in the observation space.

1.2 A.2 Actions and Action Rules

Four actions are available to the robot to control its motion. These are high level actions that must be converted into control commands for the car in the driving simulator by a low level controller.

set speed (3)—stop, go slow, go fast
turn (1)—turn car body to the left

It is assumed that these actions can produce their immediate effect as a change in state within the timestep they are applied. For example, if the car is going fast and the robot chooses the stop action, the car will be stopped by the next timestep. Other action effects, such as whether moving at a certain speed will cause the robot to move into the next region, are probabilistic.

Further probabilistic action outcomes are created by sequentially applying the action rules for the special action variables to the states resulting from the robot’s action rules. The action rules for the “Human” action describe the possible behavior of the human. The action rules for the “Env” action have preconditions based on the time variable and control the changing of the traffic light. The action rules for the “Side Effects” action detect combinations of state variables indicating that the robot and human have driven into the same region in a way that may result in a collision.

Action rules are defined by the preconditions that determine whether or not the rule is applicable in a state, the possible effects of that action on the state variable values, and a weighting factor that specifies the relative likelihood of those outcomes. This action description language provides a simple and flexible way to express the state transition structure in terms of small subsets of relevant variables and their values. For each action rule, the action variable and variable value that the rule describes is specified. In order to uniquely identify each action rule an additional rule name is also given. This is necessary because the same combination of action variable and value may have different effects and different weights depending on the preconditions. Multiple action rules may be used to specify the full range of possible outcomes and their relative weights for a single action. The possible changes to state variables caused by an action rule are specified as a list of action effects. Each action effect may effect multiple state variables simultaneously. All of the effects for an action rule are given equal weight by default. The weight, if specified, is an integer that defines a ratio of how much less likely the action effects listed are than the default. For example, a weight of 2 would mean that a rule’s action effects are half as likely.

An example action rule describing the behavior of the human agent is shown in Table 17. The special “Human” action variable specifies that this action rule describes the behavior of the uncontrollable human agent. Because the robot cannot choose the actions taken by this other agent, the action value is 0, with “flash_no_yield” providing a unique identifier for this action rule. This action rule has a single possible effect, to turn on the headlights of the human’s car. The condition list describes the state variable values that determine when this action rule may take effect. It only makes sense to apply this action in states where the headlights are off. The human may turn on the lights at any time before they move into the intersection. The goal condition restricts this rule to cases where the human has the intention not to yield to the turning driver. The velocity 0 condition reflects the fact that people were only observed to turn on their headlights when their car was not moving. This action effect is \(\frac{1}{100}\) as likely to occur as a default outcome because a car that doesn’t intend to yield flashing its headlights is a rare event.

Table 17 An example action rule describing a light flashing behavior by the human

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Broz, F., Nourbakhsh, I. & Simmons, R. Planning for Human–Robot Interaction in Socially Situated Tasks. Int J of Soc Robotics 5, 193–214 (2013). https://doi.org/10.1007/s12369-013-0185-z

Download citation

Published: 12 April 2013
Issue Date: April 2013
DOI: https://doi.org/10.1007/s12369-013-0185-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Abstract

Access this article

Similar content being viewed by others

Decision-Theoretic Planning with Person Trajectory Prediction for Social Navigation

Intention-Aware Motion Planning

An Adaptive and Proactive Human-Aware Robot Guide

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Model Description

Appendix: Model Description

1.1 A.1 States and Observations

1.2 A.2 Actions and Action Rules

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation