Skip to main content
Log in

Planning for Human–Robot Interaction in Socially Situated Tasks

The Impact of Representing Time and Intention

International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

This article presents the results of a study on the effects of representing time and intention in models of socially situated tasks on the quality of policies for robot behavior. The ability to reason about how others’ observable actions relate to their unobservable intentions is an important part of interpreting and responding to social behavior. It is also often necessary to observe the timing of actions in order to disambiguate others’ intentions. Therefore, our proposed approach is to model these interactions as time-indexed partially observable Markov decision processes (POMDPs). The intentions of humans are represented as hidden state in the POMDP models, and the time-dependence of actions by both humans and the robot are explicitly modelled. Our hypothesis is that planning for these interactions with a model that represents time dependent action outcomes and uncertainty about others’ intentions will achieve better results than simpler models that make fixed assumptions about people’s intentionality or abstract away time-dependent effects. A driving interaction governed by social conventions and involving ambiguity in the other driver’s intent was used as the scenario with which to test this hypothesis. A robot car controlled by policies from time-dependent POMDP models or by policies from two less expressive model variants performed this interaction in a driving simulator with human drivers. The time-dependent POMDP policies achieved better results than those of the models without explicit time representation or human intention as hidden state, both according to the reward obtained and to people’s subjective impressions of how socially appropriate and natural the robot’s behavior was. These results demonstrate both the relative superiority of these representation choices and the effectiveness of this approach to planning for socially situated tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abele S, Bless H, Ehrhart KM (2004) Social information processing in strategic decision-making: why timing matters. Organ Behav Hum Decis Process 93(1):28–46

    Article  Google Scholar 

  2. Aumann RJ (1976) Agreeing to disagree. Ann Stat 4:1236–1239

    Article  MathSciNet  MATH  Google Scholar 

  3. Baldwin DA, Baird JA (2001) Discerning intentions in dynamic human action. Trends Cogn Sci 5(4):171–178

    Article  Google Scholar 

  4. Bosch Lt, Oostdijk N, Ruiter JPd (2004) Durational aspects of turn-taking in spontaneous face-to-face and telephone dialogues. In: Proc of text, speech, and dialogue, 7th international conference (TSD), pp 563–570

    Chapter  Google Scholar 

  5. Bratman ME (1990) What is intention? In: Cohen PR, Morgan J, Pollack ME (eds) Intentions in communication. MIT Press, Cambridge, pp 15–31

    Google Scholar 

  6. Broz F (2008) Planning for human–robot interaction: representing time and human intention. Tech. Rep. CMU-RI-TR-08-49, Carnegie Mellon U. Robotics Institute

  7. Broz F, Nourbakhsh IR, Simmons RG (2008) Planning for human–robot interaction using time-state aggregated pomdps. In: Proc of the 23rd AAAI conference on artificial intelligence (AAAI), pp 1339–1344

    Google Scholar 

  8. Broz F, Nourbakhsh IR, Simmons RG (2011) Designing pomdp models of socially situated tasks. In: Proc of the 20th IEEE international symposium on robot and human interactive communication (Ro-man)

    Google Scholar 

  9. Chernova S, DePalma N, Breazeal C (2011) Crowdsourcing real world human-robot dialog and teamwork through online multiplayer games. AI Mag 32(4):100–111

    Google Scholar 

  10. Cirillo M, Karlsson L, Saffiotti A (2009) A human-aware robot task planner. In: Proc of the 11th international conference on automated planning and scheduling (ICAPS), Thessaloniki, Greece

    Google Scholar 

  11. Colman AM (2003) Cooperation, psychological game theory, and limitations of rationality in social interaction. Behav Brain Sci 26:139–153

    Google Scholar 

  12. Emery-Montemerlo R, Gordon G, Schneider J, Thrun S (2004) Approximate solutions for partially observable stochastic games with common payoffs. In: Proc of the 3rd international joint conference on autonomous agents and multi-agent systems (AAMAS)

    Google Scholar 

  13. Foka A, Trahanias P (2010) Probabilistic autonomous robot navigation in dynamic environments with human motion prediction. Int J Soc Robot 2:79–94

    Article  Google Scholar 

  14. Frith U, Frith C (2001) The biological basis of social interaction. Curr Dir Psychol Sci 10(5):151–155

    Article  Google Scholar 

  15. Geanakoplos J (1992) Common knowledge. In: Proc of the 4th conference on theoretical aspects of reasoning about knowledge (TARK), pp 254–315

    Google Scholar 

  16. Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134

    Article  MathSciNet  MATH  Google Scholar 

  17. Kirk RE (1982) Experimental design, 2nd edn. Brooks/Cole, Pacific Grove

    Google Scholar 

  18. Lewis DK (1969) Convention: a philosophical study. Harvard University Press, Cambridge

    Google Scholar 

  19. Maynard Smith J (1998) The origin of altruism. Nature 393:639

    Article  Google Scholar 

  20. Meltzoff AN (1995) Understanding the intentions of others: re-enactment of intended actions by 18-month-old children. Dev Psychol 31(5):838–850

    Article  Google Scholar 

  21. Mui L, Mohtashemi M, Halberstadt A (2002) Notions of reputation in multi-agents systems: a review. In: Proc of the 1st international joint conference on autonomous agents and multiagent systems (AAMAS), pp 280–287

    Google Scholar 

  22. Nourbakhsh I, Powers R, Birchfield S (1995) Dervish an office navigating robot. AI Mag 16(2):53–60

    Google Scholar 

  23. Pelphrey KA, Morris JP, Mccarthy G (2004) Grasping the intentions of others: the perceived intentionality of an action influences activity in the superior temporal sulcus during social perception. J Cogn Neurosci 16(10):1706–1716

    Article  Google Scholar 

  24. Pineau J, Gordon G, Thrun S (2003) Point-based value iteration: an anytime algorithm for pomdps. In: Proc of the 18th international joint conference on artificial intelligence (IJCAI)

    Google Scholar 

  25. Poupart P, Boutilier C (2004) VDCBPI: an approximate scalable algorithm for large scale pomdps. In: Proc of advances in neural information processing systems (NIPS), vol 17, pp 1081–1088

    Google Scholar 

  26. Puterman LM (1994) Markov decision processes. Wiley, New York

    Book  MATH  Google Scholar 

  27. Schultz J, Imamizu H, Kawato M, Frith CD (2004) Activation of the human superior temporal gyrus during observation of goal attribution by intentional objects. J Cogn Neurosci 16(10):1695–1705

    Article  Google Scholar 

  28. Sebanz N, Bekkering H, Knoblich G (2006) Joint action: bodies and minds moving together. Trends Cogn Sci 10(2):70–76

    Article  Google Scholar 

  29. Simmons R, Koenig S (1995) Probabilistic robot navigation in partially observable environments. In: Proc of the 14th international joint conference on artificial intelligence, pp 1080–1087

    Google Scholar 

  30. Smith T (2007) Probabilistic planning for robotic exploration. PhD thesis, The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA

  31. Smith T (2007) ZMDP software for POMDP and MDP planning. http://www.contrib.andrew.cmu.edu/trey/zmdp/

  32. Smith T, Simmons R (2004) Heuristic search value iteration for pomdps. In: Proc of the 20th conference on uncertainty in artificial intelligence (UAI)

    Google Scholar 

  33. Smith T, Simmons RG (2006) Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic. In: Proc of the 21st national conference on artificial intelligence (AAAI)

    Google Scholar 

  34. Striano T, Henning A, Stahl D (2006) Sensitivity to interpersonal timing at 3 and 6 months of age. Interact Stud 7(2):251–271

    Article  Google Scholar 

  35. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054

    Article  Google Scholar 

  36. Tipaldi GD, Arras K (2011) Planning problems for social robots. In: Proc of the 21st international conference on automated planning and scheduling (ICAPS)

    Google Scholar 

  37. Tomasello M, Carpenter M, Call J, Behne T, Moll H (2005) Understanding and sharing intentions: the origins of social cognition. Behav Brain Sci 28:675–735

    Google Scholar 

  38. TORCS (2013) The open racing car simulator. http://torcs.sourceforge.net/

  39. Walter H, Adenzato M, Ciaramidaro A, Enrici I, Pia L, Bara BG (2004) Understanding intentions in social interaction: the role of the anterior paracingulate cortex. J Cogn Neurosci 16(10):1854–1863

    Article  Google Scholar 

  40. Wereschagin M (2006) Pittsburgh left seen by many as a local right. Pittsburgh Tribune-Review, 14 June 2006

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Broz.

Appendix:  Model Description

Appendix:  Model Description

The overall structure of the domain description file is given in Table 16. The state space is time-indexed, so the number of timesteps in an episode must be specified. The time index is represented by a special state variable named “Time” that always has the value of the current timestep. This variable may be referred to in the preconditions of action or reward rules but may not have its value changed by an action rule. The rest of the state space is defined by a list of state variable names with the corresponding range of integer values that each variable make take.

Table 16 Structure of the model description file

The action space is defined in the same manner. Additionally, three special action variables are defined, “Human,” “Env,” and “Side Effects”. These are used to specify the action rules that describe the actions of the human agent and the environment. Because these actions are automatically taken in response to the actions of the robot at each timestep rather than being chosen by the robot, their range of action values are undefined.

1.1 A.1 States and Observations

The state space for the POMDP models is defined as follows:

  • human intention (2)—go first, yield

  • robot position (5)—the region occupied by the robot

  • human position (5)—the region occupied by the human

  • robot angle (2)—whether car body is turned more than 45 degrees

  • robot speed (3)—stopped, rolling/creeping, fast

  • human speed (3)—stopped, rolling/creeping, fast

  • human headlights (2)—whether the human has flashed their headlights

  • collision (3)—no collision, collision at this timestep, collision has occurred

  • traffic light (3)—the color of the traffic light

  • time (60)—the current timestep

The intersection and the surrounding roads are partitioned into rectangular regions relative to each agent’s starting position that are significant to the interaction. These regions are: more than a car length from the start of the intersection, a car length from the start of the intersection, the half of the intersection closest to its start, the half of the intersection closest to its end, and the road on the other side of the intersection. There is an additional position value that corresponds to the event of an agent entering the region of the road beyond the intersection for the first timestep. This value is used to assign the one-time reward for an agent crossing the intersection.

Observations are created by defining a subset of the state variables that are directly observable. The observable state variables for this domain are as follows:

  • robot position

  • human position

  • human headlights

  • human speed

  • collision

  • traffic light

Because the robot’s current speed and angle should always be known based on the effects of the set speed and turn actions, these variables are not included in the observation space.

1.2 A.2 Actions and Action Rules

Four actions are available to the robot to control its motion. These are high level actions that must be converted into control commands for the car in the driving simulator by a low level controller.

  • set speed (3)—stop, go slow, go fast

  • turn (1)—turn car body to the left

It is assumed that these actions can produce their immediate effect as a change in state within the timestep they are applied. For example, if the car is going fast and the robot chooses the stop action, the car will be stopped by the next timestep. Other action effects, such as whether moving at a certain speed will cause the robot to move into the next region, are probabilistic.

Further probabilistic action outcomes are created by sequentially applying the action rules for the special action variables to the states resulting from the robot’s action rules. The action rules for the “Human” action describe the possible behavior of the human. The action rules for the “Env” action have preconditions based on the time variable and control the changing of the traffic light. The action rules for the “Side Effects” action detect combinations of state variables indicating that the robot and human have driven into the same region in a way that may result in a collision.

Action rules are defined by the preconditions that determine whether or not the rule is applicable in a state, the possible effects of that action on the state variable values, and a weighting factor that specifies the relative likelihood of those outcomes. This action description language provides a simple and flexible way to express the state transition structure in terms of small subsets of relevant variables and their values. For each action rule, the action variable and variable value that the rule describes is specified. In order to uniquely identify each action rule an additional rule name is also given. This is necessary because the same combination of action variable and value may have different effects and different weights depending on the preconditions. Multiple action rules may be used to specify the full range of possible outcomes and their relative weights for a single action. The possible changes to state variables caused by an action rule are specified as a list of action effects. Each action effect may effect multiple state variables simultaneously. All of the effects for an action rule are given equal weight by default. The weight, if specified, is an integer that defines a ratio of how much less likely the action effects listed are than the default. For example, a weight of 2 would mean that a rule’s action effects are half as likely.

An example action rule describing the behavior of the human agent is shown in Table 17. The special “Human” action variable specifies that this action rule describes the behavior of the uncontrollable human agent. Because the robot cannot choose the actions taken by this other agent, the action value is 0, with “flash_no_yield” providing a unique identifier for this action rule. This action rule has a single possible effect, to turn on the headlights of the human’s car. The condition list describes the state variable values that determine when this action rule may take effect. It only makes sense to apply this action in states where the headlights are off. The human may turn on the lights at any time before they move into the intersection. The goal condition restricts this rule to cases where the human has the intention not to yield to the turning driver. The velocity 0 condition reflects the fact that people were only observed to turn on their headlights when their car was not moving. This action effect is \(\frac{1}{100}\) as likely to occur as a default outcome because a car that doesn’t intend to yield flashing its headlights is a rare event.

Table 17 An example action rule describing a light flashing behavior by the human

Rights and permissions

Reprints and permissions

About this article

Cite this article

Broz, F., Nourbakhsh, I. & Simmons, R. Planning for Human–Robot Interaction in Socially Situated Tasks. Int J of Soc Robotics 5, 193–214 (2013). https://doi.org/10.1007/s12369-013-0185-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-013-0185-z

Keywords

Navigation