The Role of Functional Affordances in Socializing Robots

Awaad, Iman; Kraetzschmar, Gerhard K.; Hertzberg, Joachim

doi:10.1007/s12369-015-0281-3

The Role of Functional Affordances in Socializing Robots

Published: 27 March 2015

Volume 7, pages 421–438, (2015)
Cite this article

Download PDF

International Journal of Social Robotics Aims and scope Submit manuscript

The Role of Functional Affordances in Socializing Robots

Download PDF

Iman Awaad¹,
Gerhard K. Kraetzschmar¹ &
Joachim Hertzberg²

923 Accesses
25 Citations
Explore all metrics

Abstract

Just as humans behave according to the social norms of their groups, autonomous systems that become part of these groups also need to behave in socially-expected and accepted ways. For humans these social norms are learned through interaction with members of the group. In this work, we propose that the functional affordances of objects, what objects are meant to be used for, provide us with a starting point for the socialization of such agents. We model these functional affordances in description logics and show how this enables the socially-expected human behavior of substituting objects as needed to achieve a goal. In addition, we propose to combine these affordances with conceptual similarity and proximity in order to make more complex substitutions, which are socially acceptable in their given context. Finally, we describe how their use would allow the agent to take advantage of opportunities and how they are modified and extended through interaction with humans.

An Affordance-Based Conceptual Framework for Spatial Behavior of Social Robots

An Extended Framework for Characterizing Social Robots

The Secret Life of Robots: Perspectives and Challenges for Robot’s Behaviours During Non-interactive Tasks

Article 23 April 2020

Silvia Rossi, Alessandra Rossi & Kerstin Dautenhahn

1 Introduction

Domestic service robots are expected to carry out tasks around the house, such as cleaning, fetching objects, serving drinks and so on. Their success has traditionally been based on their ability to understand commands and accomplish the given tasks. Such agents are typically mobile manipulators with capabilities at varying levels of complexity, such as perception, mapping, manipulation, navigation, dialog management and task planning. The integration of these capabilities to form an architecture which enables a flexible and robust agent remains a focus of much research.

An agent which is also capable of learning, be it from demonstration, by experimentation, or by querying external knowledge bases, is no longer simply desirable, but necessary for life-long learning. We argue, that in addition to this, domestic service robots need the capability to acquire, assimilate and apply the social norms of the group with which they interact so that they can behave in socially-expected and accepted ways. Humans cohabiting the environment would gain from the more natural human-robot interaction which would result, and the agents would also gain from the flexibility that these norms provide humans when performing their everyday tasks.

Children pick up social norms mainly through interaction with their families, and usually from schools, their own peers and (fortunately, or not) the media as well. They learn how to perform tasks, as well as how not to perform them (e.g. clothes should be hung or folded and placed in the closet; or that a glass of water should be placed on a coaster and not directly on the table). They learn manners (e.g. that they should use the word ‘please’ when asking for something), and they learn when and which substitutions are acceptable (e.g. that a mug may be substituted for a glass but not the other way around, and that such a substitution is not appropriate when the drink is for a guest). Such knowledge clearly goes beyond how to accomplish tasks.

Let us consider the task of serving water to someone. Humans know that a glass should be filled with water and served to the guest, and that glasses are in a particular cupboard in the kitchen. Robots should know this too. Should humans not find any glasses there, they know that there may be some glasses in a sink, or a dishwasher, but that they should be checked to ensure that they are clean. Robots should probably know this too. In some cases, humans may choose to simply use a mug instead of washing a glass. Robots may know that such substitutions are possible if this is specified explicitly. When serving the person, humans know that the glass should be placed on a coaster. Robots should know this too.

There are various types of knowledge described in the use case above. There is the knowledge of the goal (serving a glass of water) and matching this with the procedural knowledge of how to go about doing this. Usually, this knowledge includes the objects with which the tasks are to be accomplished (e.g. the glass, the coaster, the cupboard). There is also the knowledge of when it is socially acceptable to make substitutions and when it is not.

While the agent should be capable of learning much of the knowledge mentioned above, it is also expected to function sufficiently well, out-of-the-box. Knowing how to do many things, however, is not sufficient, nor can this be defined as intelligence. “ ... The true test of intelligence is how we behave when we don’t know what to do” [24].

What we want is an approach that allows the agent to determine when it has insufficient knowledge, to acquire it, when possible, or find an alternative, and then successfully carry out the task.

Our approach adopts the open world assumption (through the use of DL which by default assumes incomplete information), so unless our knowledge base contains a statement (or can infer one) to the effect that something is true or that it is false, our query would return ‘do not know’.

In the work presented here, knowledge of how best to carry out basic tasks is encapsulated within Hierarchical task network (HTN) planning [11] methods and operators. Methods recursively decompose complex tasks into primitive ones which can be carried out through the execution of grounded operators. Together with the state of the world, these methods and operators constitute the planning domain.

In an environment shared by humans and artificial agents, this approach is beneficial, as it is more understandable for humans; and a good agent should be able to communicate its plan at all times [7]. In addition, it lets the human user specify the way he/she wishes to have a task accomplished in an intuitive way.

Let us consider the task of watering a plant; the domain modeler would specify methods and operators which describe how the task is to be accomplished and specify that a watering can should be used. A planner queries for the initial state of the world and, given the goal, generates a task network. The plan is simply the sequence of actions found in the leaf nodes of the network from left to right.

If there is no watering can in the domain or, despite all of our methods and operators, no decomposition is found to accomplish the task, the plan generation process will fail. For example, when the watering can exists but is inaccessible and we have no means by which to make it accessible.

One would intuitively expect a human to ask for help or to simply use something else to accomplish the task (e.g. a tea kettle or a jug). This ability to effortlessly adapt our actions to unexpected situations, especially given the dynamic nature of our environment and the amount of uncertainty about it, is perhaps one of the most underestimated human abilities. Very often, changes in our plans have to do not so much with how we carry out a task, but with what we carry it out.

Similarly, it would be desirable for an autonomous agent to ask for help (instead of simply communicating that it cannot accomplish the task). It would be even better if the agent could itself reason about what a good substitution would be and ask for a user’s approval before attempting to make the substitution.

This work argues for both the benefits that come from allowing agents to make substitutions, and demonstrates how the use of functional affordances, conceptual similarity and spatial proximity can allow agents to reason about and identify appropriate substitutions.

2 Affordances

The concept of affordances provides us with the necessary perspective with which to equip agents to behave with such flexibility. Affordances describe “opportunities for action” [17]. This work adopts the notion of affordances, although Gibson’s action/perception coupling is not dealt with directly. Gibson’s original definition has been refined by many researchers, but a generally agreed upon interpretation narrows the list of action choices to those of which an actor is aware. Using the refined definition, affordances are neither solely a property of the object nor of the actor, but of their relationship.

Under Gibson’s original definition, the set of affordances for a given object may be quite large, and may include actions that are neither socially expected nor socially acceptable (e.g. throwing a chair). In this work, we adopt Norman’s definition of perceived affordances which allude to “how an object may be interacted with based on the actor’s goals, plans, values, beliefs and past experience” [37]. This is consistent with our ideal domestic service robot: a goal-based [50] agent that can also learn from experience, and adheres to the values and beliefs of its group.

2.1 Distributed Cognition

We need a starting point for our agent’s socialization process—a kernel of norms, if you will, which represents these values and beliefs in a manner that permits the agent to make appropriate decisions. Where could knowledge of social norms come from, and what does it look like? To answer this question, a paradigm shift is necessary to view “knowledge” as facts that have been shaped by the values, beliefs and experiences of groups of people. For example, the fact that teacups are for drinking tea may not hold in Japan where tea is drunk from a bowl, or in Argentina where it is drunk from a hollow gourd. In fact, the word ‘tea’ itself would no doubt refer to different types of tea altogether. The English definition of a teacup is rooted in the English cultural tradition of drinking tea. We know this from experience (our own or, interestingly, that of other individuals of the group).

With this in mind, we note Hutchins’s theory of distributed reasoning and cognition which states that knowledge lies not only within the individual, but in the individual’s social and physical environment [26]. Others have further elaborated this idea [64]. This concept is appealing, as it acknowledges the impact that social groups have in shaping what we know. Moreover, it implies that it is no longer necessary for one to experience something him, her or itself in order to know it.

One could, therefore, argue that resources such as dictionaries, the Internet, WordNet, ConceptNet, OpenCyc [8], and the work of projects such as RoboEarth [25] are an example of distributed cognition, albeit for those groups whose native language is English since “language does not exist apart from culture” [51]. This paradigm allows us to reformulate the question: How can the agent acquire, reason and manipulate knowledge to behave in a socially-compliant manner?

2.2 Functional Affordances

The answer lies in the simple notion that objects are made to be used for (or exist for use in) specific tasks, and that this knowledge has been shaped by the norms of the group. Such functional affordances [21] link the idea of “purposeful actions” to the objects, and account for descriptive social norms (“what is typically done in a given setting” [45, p. 104). They include within them the “values and beliefs” and provide us with a starting point for “past experience”. They can then be manipulated and adapted based on further interaction with the social group that the agent is part of. The result is behavior that is socially expected; we are using objects for what they were meant to be used for.

There are other benefits to using functional affordances. By considering functional affordances, and not all “opportunities for action”, the action space is reduced. They also allow users to specify more general tasks. For example, when asked to “serve a drink”, any object meant ‘for drinking’ could be served without the need to explicitly specify a particular drink. This is more important than it seems at first glance, since much of our interaction with each other involves a great deal of underspecification.

We propose to use dictionaries as a source from which an agent acquires these functional affordances. They provide concise and unambiguous definitions of objects that almost always include their function. For example, a teacup, is defined as “a cup from which tea is drunk” [33], and a cup is “a small, bowl-shaped container for drinking from, typically having a handle” [33]. Therefore, dictionaries make ideal sources to mine the functional affordances of objects from.

Objects may have more than one functional affordance (e.g. a bottle has the primary functional affordance of storing liquids but a secondary functional affordance can be learned through interaction: it is also for drinking from). These affordances are included within the domain model and are represented compactly in DL. This allows us to use the reasoning powers of existing tools to bring about the robust and flexible behavior described above. The functional affordances of parts of objects are also modeled. This has two main benefits. First, it acts as a causal link, explaining why an object has a given affordance, and second, it helps the agent to recognize affordance cues or stimuli [14] at execution time and respond to them.

Our HTN planning domain already provides us with the ‘best way’ of carrying out a task (e.g. it would specify that tea should be served in a teacup). Knowing the function of an object allows us to behave flexibly in case of plan generation failure (e.g. we know that all teacups are dirty, and we do not know how to make them clean, so a plan cannot be generated) or execution failure (e.g. we did not know they were all dirty at planning time but found out during the course of execution). Choosing to use another object with the same functional affordance is the socially-expected and generally accepted course of action.

3 Approach to Socializing Agents

In this section, we provide an overview of how our affordance-based approach leads to flexibility, makes for compact representations, and allows the social norms to be refined to those of the group through interaction. For example, humans cohabiting the environment might ask the robot to clean the bathrooms only with the blue cleaning cloths, or to serve them tea only in their favorite cup.

3.1 Socially-Expected Behavior

In Sect. 1, we saw how the act of making substitutions is a socially-expected behavior in itself. We expect that people are able to find ways to accomplish their tasks under all but the most extreme cases. In this section we demonstrate how the combination of procedural knowledge (how to accomplish a task) and the functional affordances of objects (what objects are meant to be used for) together provide us with the socially-expected choice of the substituted objects (e.g. glasses and mugs are both used to drink from).

Simply querying the knowledge base (KB) for objects with the given functional affordance provides us with an appropriate substitute. This is accomplished without the need for cumbersome, ad hoc and often subjective categorization of objects. For example, ontologies of domestic objects (such as those mentioned in Sect. 2.1) may contain categories such as ‘furniture’ or ‘perishable objects’. The problem with this is, first, that it simply refers to qualities that a group of items may have (in the case of perishable goods, they will eventually perish); and that the decision of whether an object belongs to such a category or not may be subjective (is a spoon or a chandelier considered furniture?).

World models tend to describe the form of the world: objects, their shapes, colors or locations and their spatial relationship to one another. The same world can be described by the functions it is meant to afford. Studies in child psychology have found that children use functional affordances to generalize the name of newly-learned artifact categories and otherwise rely on global similarity when they could not interact with the objects [36].

There are cases, however, when using functional affordances alone will not be enough. Some objects are used for a very specific task (e.g. watering cans are used for watering plants). The only other object which is used for the same task would be a ‘hose’, and this is only for watering plants outdoors. In this case, both share the same functional affordance of watering plants; but whereas it may be desirable to substitute the watering can for the hose, the opposite is not true, and so a substitution using only functional affordances may not be possible.

Here, the agent would need to look for objects which are conceptually similar to the watering can. The similarity measures which are often used may not yield the results we have in mind (we may not care about the color of an object, but rather the presence of a handle for example).

For describing similarity, we propose the use of Conceptual Spaces [15] (CS). They provide a multidimensional feature space where each axis represents a quality dimension (e.g. brightness, intensity, and hue). Points in a conceptual space represent objects, while regions represent concepts.

Let us take the example given in [15]: the three quality dimensions in our example can together be used to describe the ‘color’ domain (see Fig. 1). A region on the red axis could be described as having the property ‘red’. A point in this region could represent the concept ‘apple’ in conjunction with other domains such as ‘taste’ or ‘shape’. We could even relate the property ‘red’ to the taste ‘sweet’.

We propose that the agent should learn the relation between these quality dimensions and given tasks. For example, for lifting an object, the most important quality dimension is its weight – its color would be irrelevant. These relations could then be used as weighting factors to determine how well an object would substitute for another in achieving a given task (similarity would be measured as the weighted Euclidean distance).

Conceptual spaces can also represent shape (e.g. handles and spouts). The detection of these quality dimensions obviously requires more processing by the perception components than for example, the simple detection of hue. To substitute a watering can to water plants, the capacity to hold water is the most important affordance, followed by the presence of a handle and a spout. Using conceptual spaces, the agent might find that the tea kettle is the most appropriate substitution. The combination of active perception at execution time and task-oriented perception would allow the agent to actively search for those features (e.g. spouts and handles) which are relevant to the task at hand, as opposed to passively picking up any and all cues. [2] has shown that the time complexity for such a search is far better when compared to a data-driven search.

3.2 Socially-Accepted Behavior

Having shown how functional affordances provide us with the ability to make basic socially-expected substitutions, we now present how socially-acceptable substitutions can be made by combining them with conceptual similarity and proximity in various ways to create a hierarchy of constraints.

Using lifting, functional affordances and conceptual similarity, an artificial agent starts by attempting to satisfy the constraints specified in the methods and operators (e.g only use a unique instance, such as my teacup—if this was specified in the goal—or an instance of a given object). If it fails to find the suitable object, it would iteratively attempt to find objects which satisfy fewer and fewer constraints (see Fig. 2).

The first level above that of using any instance of a given object (e.g. a teacup) is to use any object with the same functional affordance and high conceptual similarity (e.g. a mug). The next higher level would remove the constraint that the substitute should be conceptually similar, relying only on a shared functional affordance (e.g. a drinking flask). Should the agent not find such objects and given the old adage that “form follows function” (the form of objects is based on their function), conceptual similarity is then used to identify those objects which do not share the same functional affordance and yet are conceptually similar (e.g. a measuring cup). The top level attempts to infer the function-relevant attributes and identify objects matching these properties (e.g. a jar).

It is important to note here that injunctive social norms (“what is typically approved in society” [45], p. 104) are highly dependent on context and may differ from person to person. For example, it may be acceptable for me to have my tea served in a mug, but this may not be acceptable in the presence of guests, or for another user.

The ability of the agent to acquire and manage user preferences in their proper context would provide a significant improvement of the human robot interaction experience.

Humans prefer to take advantage of objects within their immediate spatial surroundings when making substitutions (e.g. using magazines which may be on the table instead of a coaster). Agents should also exploit spatial proximity. The importance of proximity can be altered to make it either easier to move from one level of constraints to another (to climb the ‘flexibility ladder’ as it were) by increasing its importance (prefer objects which are close, even if they are in a less constrained category of objects), or more difficult to move up by decreasing its importance.

The work presented in [28] and [44], while addressing a different focus, is perhaps the closest to ours in that they also use functional affordance and conceptual spaces to measure similarity. Their work is based on an adapted version of the HIPE theory of action and so they have included additional types of affordances based on both physical and socio-institutional constraints.

4 Implementation

A holistic approach enabling the agent to flexibly plan and act in a domestic service setting is presented here. In plan-based robot control, “...robots generate control actions by maintaining and executing a plan that is effective and has a high expected utility with respect to the robots’ current goals and beliefs” [6]. Our approach to enabling socially-expected and accepted behavior is a plan-based robot control architecture where the plans themselves are managed and adapted through the use of object categories, functional affordances, conceptual similarity and proximity and of course by changes in the environment and the user’s goals. The current state of the work has enabled us to identify, plan and use substitutes using object categories and functional affordances.

This is accomplished through three reasoning phases. The first phase generates a focused planning problem which only includes that part of the domain (e.g. methods, operators, objects, and their states) which is relevant for the given task. The second phase expands the domain where necessary to include possible substitutes and the methods and operators which may be specific to their use, and re-plans to use the substitute. The third and final reasoning phase uses affordances during plan execution to take advantage of opportunities. These are described in the following sections.

4.1 Phase I: Creating a Focused Planning Problem

While the HTN approach’s use of domain knowledge to focus the search for a plan has contributed to its popularity, domains with many instances of objects and numerous methods to decompose the same task will result in a large search space. Hartanto [20] combined HTN planning and DL reasoning to create focused planning problems by including more domain-specific information on relevance. For example, in a navigation domain, the approach and the modeling of the domain in OWL-DL ensured that only rooms with open doors were included in the description of the initial state (with the assumption that the robot could not open closed doors). Hartanto also (empirically) showed the value of integrating HTN planning and DL reasoning.

In our approach, we extend this implementation in two ways. By modeling functional affordances, we are able to focus and/or expand the domain in a systematic way. Second, the conversion of the domain from the OWL-DL syntax to the JSHOP2 [27] syntax (and vice versa) is accomplished via a model-to-model transformation (M2M), as opposed to simply including the JSHOP2 code as a parameter which would need to be adjusted manually if any change is made to the OWL-DL method or operator. M2M is a model-driven software engineering technique which is used to automatically transform one representation into another.

An overview of the process of creating the focused planning problem is shown in Fig. 3. In this figure, a user is seen assigning the agent a task to be accomplished. The problem generator creates a series of SPARQL-DL [55] queries which returns the relevant parts of the domain. The queries are designed to match the task that is given to a task name (i.e. to identify the methods/operators in our OWL-DL KB that decompose the given task). Once identified, all the methods and operators that may be used to decompose these methods are included in an ABox. The hierarchical nature of HTN enables such a comprehensive set of methods and operators to be identified. This continues until all possible methods and operators which may be called for to decompose the given task are included. For each of these methods and operators, the relevant predicates which would describe the initial state are also included. The modeling of the domain is crucial to enabling this process (see Sect. 4.1.1 for details).

Having created the planning problem in OWL-DL, a model-to-model transformation is made to convert it into the necessary JSHOP2 domain and problem files which are the input to the JSHOP2 planner. The planner itself has been slightly modified to produce an extended plan: the plan itself as well as the add and delete lists for each action. These lists allow the agent to update the KB with the changes each action is making to the world when it is successfully executed. Another extension provides the reasons the planning process may have failed. This explanation is crucial for triggering the substitution process and is detailed in Sect. 4.2. Once a plan is successfully generated, the agent uses it to guide it as it acts.

4.1.1 OWL-DL Models

The modeling of the domain starts with a top-down approach based on a task to be accomplished. The methods and operators are defined in JSHOP2, and the domain tested. M2M would transform the methods and operators into OWL-DL. The modeling of the planning domain in OWL-DL currently adheres to the model detailed in [20]. New objects which are referred to in the domain may need to be created. The objects, their properties and locations are modeled in OWL-DL based on their dictionary definitions.

The terminological knowledge specified in the TBox is based on the upper ontology of [59]. Thus, it includes concepts such as $\mathsf {SpatialThing-Localized}$. The concepts (or classes) specified here are used throughout the software architecture by the various components such as planning, navigation, manipulation and perception although each component may use additional KBs (e.g. OCL in the case of the grasp planner). The assertional knowledge, specified in the ABox, constitutes the state of the world. When changes occur in the environment and are perceived, the ABox is updated accordingly. Following OWL conventions, class names are capitalized while instances start with lower case-letters.

As mentioned above, the modeling of the planning domain is based on the work of [20]. The planning problem, domain, methods and operators are intuitively modeled as follows:

The $\mathsf {useState}$ property explicitly specifies the objects and their properties which are relevant when using the method or operator (i.e. the relevant subset of predicates to be included in the description of the initial state). An example of the JSHOP2 method $\mathsf {swap}$, and its two operators $\mathsf {pickup}$ and $\mathsf {drop}$ (see the $\mathsf {basic}$ problem accompanying the JSHOP2 planner) is given here along with its OWL-DL counterpart. The JSHOP2 code is actually included within the datatype property $\mathsf {shop2code}$ of the OWL-DL models:

Algorithms 2.1 and 2.2 in [20] specify how the transformation process from OWL-DL to the JSHOP2 planning domain and problem is made. The $\mathsf {useState}$ properties must be modeled after the transformation as they are not part of the JSHOP2 syntax.

While this representation allows a focused planning problem to be created, it could be improved upon to avoid the inclusion of the JSHOP2 code directly in the OWL representation. Hence our goal of replacing this representation with one that uses M2M. The M2M process is accomplished by first deriving metamodels of JSHOP2 in OWL-DL and in Ecore [60]. These metamodels provide a common representation. ATL [19] is then used to create a template which would transform a source model into the target model. Thus enabling automatic transformations in both directions. Additionally, the model-driven approach allows us to cleanly separate the JSHOP2 domain models and the M2M transformations from their implementation in a general-purpose programming language such as Java or C++.

We extend operator descriptions by allowing for the specification of the functional role of an action, so that motion and grasp planners can provide suitable target poses. The operator $\mathsf {goTo(?kettle, ForGrasping)}$ for example, communicates the reason why the agent needs to go to the kettle. This information is used by the motion planner to take into consideration the necessary constraints in identifying the final pose to reach. These constraints may also be influenced by the perception components (the robot needs to be in a position that would allow it to localize the kettle in order to proceed with the grasping action). In addition, a constraint-based system for grasping [52] uses the information to verify that the action can indeed be performed using the specified object and with the specified hardware. For example, the action $\mathsf {grasp(?cleanerBottle,ToSpray)}$ would trigger the system to validate that such a task is possible. This is important as faults can occur at any time and it is therefore not sufficient to specify capabilities once. Moreover, these capabilities depend on the agent, the object and the intended use, hence the use of affordances to link them. In this paper, we show how the task planner’s inclusion of this information is passed to the manipulation and navigation components. For details on the manipulation component’s use of this information, please see [52]. The ability of the navigation component to use the task information for deriving a good end pose maximizes the chances to accomplish the required task.

In Sect. 2.2, we highlighted the various reasons why dictionary definitions provide a valuable basis on which to model objects, their parts and their functional affordances. The models for a cup and a teacup, for example, are given here in Manchester OWL Syntax:

Functional affordances are modeled by the $\mathsf {isUsedFor}$ property. As a defined class, any instance that is a member of the $\mathsf {Cup}$ class and that is used to drink tea from will be inferred by the reasoner to be a $\mathsf {Teacup}$. These are necessary and sufficient conditions that enable the KB to be used for inference and not simply as a database.

In addition to these properties, the $\mathsf {Teacup}$ concept also inherits various properties from superclass concepts (including additional $\mathsf {isUsedFor}$ properties). It inherits the properties shown above for the $\mathsf {Cup}$ class and the $\mathsf {isUsedFor\,some\,Holding}$ and $\mathsf {isUsedFor\,some}$ $\mathsf {Transporting}$ properties from the class $\mathsf {Container}$. Additional properties from other superclasses include $\mathsf {hasStorageLocation}$ and $\mathsf {belongsTo}$ for example.

$\mathsf {Holding}$, $\mathsf {Transporting}$, and $\mathsf {DrinkingFrom}$ are all examples of the $\mathsf {ActionOnObject}$ concept. This is the superclass of all the functional affordances of objects and parts of objects.

The $\mathsf {isAlsoUsedFor}$ property enables new functional affordances to be learned through experience, for example, by having been successfully substituted for a task. A bottle is defined as “a container, typically made of glass or plastic and with a narrow neck, used for storing drinks or other liquids” [33]. Should a user drink from the bottle, the property

$$\begin{aligned} \mathsf {isAlsoUsedFor\,\,some\,\,DrinkingFrom} \end{aligned}$$

would be added to the $\mathsf {Bottle}$ concept. This also provides a quick way to look up which objects have previously been approved as substitutes and thus, allows the transfer and re-use of this knowledge.

In many cases this simple representation of functional affordances is sufficient. However, they are often n-ary properties. For example, $\mathsf {Teacup}$ and $\mathsf {DrinkingFrom}$ are related via the $\mathsf {isUsedFor}$ property, but in addition, they are related to the $\mathsf {Tea}$ concept. OWL properties link two concepts and so representing a relationship that exists between multiple concepts simultaneously requires a slightly different approach. One such approach is through reification. [53] provides guidelines to aid domain experts in deciding when and how to reify n-ary predicates. Our current model is being extended with the aid of these guidelines.

4.2 Phase II: Finding Viable Substitutes

Having shown how we model the domain, we now look into how the models support the identification of viable substitutes. This is accomplished by expanding the domain systematically by climbing the flexibility ladder mentioned in Sect. 3.2. The process is triggered in two situations. The first is when the planner fails to generate a plan, and the second is when a plan’s execution fails (see Fig. 4).

Unfortunately, things can, and often do, go wrong during the planning process itself. As mentioned above, the JSHOP2 planner has been modified to provide the explanations as to why it has failed to generate a plan:

1.
No operator can achieve this task
2.
No binding which satisfies a given precondition for the given operator could be found (e.g. $\mathsf {clean\, ?teacup}$)
3.
No method can achieve this task
4.
No binding which satisfies a given precondition for the given method could be found
5.
No branch^{Footnote 1} of a given method is applicable (preconditions not satisfied)

These include a reason for the failure, such as the failed preconditions, the bindings which have been made, as well as the task network up to the point of failure.

One likely scenario is that a method or operator requires that a precondition be met and the KB lacks the information to determine if it is or is not. For example, it contains no information on whether or not there exists an instance of a clean cup, even if it knows that all known instances are dirty.

Humans faced with the same situation would still attempt to proceed with their plan and assume that they will adapt as need be. As long as there are instances of the object, we attempt to generate a plan. If the locations of the instances are unknown, this flexibility is achieved by querying a semantic map and creating placeholder instances as needed at the most probable location. For example, a placeholder instance of $\mathsf {clean\, cup}$ may be instantiated $\mathsf {in\, cupboard4}$ in the ABox. This allows the planning process to proceed and increases the chances of successfully getting the job done.

Assuming we have sufficient information within our KB to determine if preconditions are met or not, and even with the help of these placeholder instances, the planner may still fail to generate a plan due to a failed precondition. For example, all cups may in fact be dirty. In such a case, humans would consider either washing a cup or making a substitution that would allow them to get the job done. They may use a mug, for instance. Here, we assume that, if a method allowing the robot to wash a cup while making tea was desirable to the user, it would have been included in the tea-making method decomposition and $\mathsf {dirty\, teacup}$ would be the filter condition that would allow that branch to be taken. However, in our domain, washing the cup is undesirable (because our robot simply cannot wash the cup and the dishwasher would take too long) and so the agent has no choice but to attempt a substitution.

If we recall how our planning problem was generated, the initial state is constrained and only contains $\mathsf {Teacup}$ instances, as that is what the methods and operators specify. A domain-expansion process is triggered to include the next best possible substitute for a teacup. In addition, a means to enable the new object to bind to the variables in the existing methods and operators needs to be found. This is necessary to handle cases such as the one described above where filling a kettle to water plants involves additional actions which are missing from the original plant-watering domain.

In [3], we first identified the need to use lifting in our planning approach to enable the desired flexibility and capitalize on the affordances that are encountered at execution time. In [4], we identified the placeholder mechanism to implement this. Here, we distinguish between the two cases of unmet preconditions and incomplete information and outline the solution of using a placeholder object in the latter case.

These mechanisms allow the agent to handle cases where the plan generation process has failed due to a missing instance of an object. Coupled with the behavior described in Sect. 4.3, they also provide the flexibility to use any instance at acting time and thus enable opportunistic behavior.

4.3 Phase III: Acting

Once a plan is generated, each action is executed and monitored to validate its outcome before the next action is handled. Each JSHOP2 operator has a corresponding skill coordinator. The concept of skills that encapsulate robot functionalities is taken from the 3T architecture [40]. The Action Execution/Monitoring component shown in Fig. 4 consists of two sub-components: TaskManagment and Execution.

The TaskManagment sub-component maps the actions to the corresponding skill coordinators and activates them.

Each skill coordinator has a SMACH [12] state machine with three corresponding SMACH states: the first executes the action, the second executes the monitoring action and the third reports the final execution outcome of the planned action.

The Execution module has two outputs: ‘feedback’ (the up-to-date state of an action’s execution), and ‘result’ (the final status after the monitoring action has been executed). The state of execution of each action is sent to Plan-Based Control which decides how to proceed next. Details of the integration of planning, execution and monitoring in our system can be found in [54].

Making substitutions during the acting phase also makes it possible for agents to take advantage of opportunities, e.g. using a magazine that happens to be on the table as a coaster. In order to accomplish this, we need to combine both the execution of plans which have been generated through the deliberation process and reactive behaviors which may be triggered by affordance cues.

In addition to the KB, we use a simple blackboard to communicate lower-level information. In particular, affordance cues [14], in the form of conceptual space quality dimensions, are being posted by all artificial agents as they move through the environment. These cues are of varying complexity, from simple color hues which would cost very little in terms of perceptual processing to more complex concepts such as shape. They may have been picked up as part of the plan’s execution, and would be kept in the system for a given duration. Upon execution failure, the cues which are in close proximity can be used to identify viable candidates for substitutions. The same cues allow the agent to take advantage of opportunities as it carries out tasks during execution. This is what [31] call “serendipity” in a navigation domain. It is another motivating factor for a distinct affordance-based approach. In situations where little is truly within the agent’s control, it makes sense to consider what opportunities for action the environment affords, rather than considering those which it may or may not provide. For example, a cupboard full of glasses would guide the agent to grasp any of them. As Steedman points out in [57], “...it is probably better to look at those plans the situation affords, rather than backward chaining to conditions that there may be no way for you to satisfy...”. By having all agents post cues, information about the state of the world can be shared.

The same behavior can guide plan execution when things are going as planned, allowing the agent to take advantage of opportunities before failures occur. For example, cues associated with a drink bottle may have been picked up on the way to the location specified in a plan. This ‘short cut’ could be taken advantage of, again depending on the flexibility that the human user has allowed. A cupboard full of glasses would guide the agent to grasp any of them, if no specific instance has been identified. In the case of execution failure, an agent might take the more ‘resourceful route’ of making a substitution or attempt to use the same object by finding other instances, or of using objects with the same functional affordance.

Taking advantage of opportunities by reacting to affordance cues has the added benefit of injecting that bit of randomness that often leads to improvements. Although our approach will use a plan library to avoid having to generate plans for the same routine tasks from scratch over and over again, exclusive reliance on these plans could lead to stagnation. This is a generally agreed upon disadvantage of using plan libraries. Changes in the world such as the availability of objects or a change in the state of the world may provide different ways for the task to be accomplished, but would never be considered if the same plan from the library is always reused. For example, buying an autonomous vacuum cleaner may render the continued use of the conventional one undesirable. The agent needs a decision mechanism that would decide when to use a plan from the library (possibly with adaptation) or generate a new plan from scratch, similar to the techniques surveyed by Meneguzzi and DeSilva [34].

Whether through interaction or through observation, an agent’s view of objects’ functional affordances will change. If we recall the possibility of learning an additional functional affordance for a bottle as mentioned in Sect. 2.2, we see that it is just as possible for the functional affordance to be unlearned or marked undesirable if the agent is so instructed. This developmental dimension that arises by considering the function of objects fits nicely with the profile of our desired domestic service robot.

One open question is when the agent should stop climbing the “flexibility ladder”, that is, when it should concede that it cannot achieve the task.

4.3.1 Action Substitution

In addition to our work on substituting objects, we have also investigated action substitution in the form of affordance-based action abstraction [23] where the agent learns which actions, or behavior instances, (as opposed to objects, as discussed in this paper) may be substituted and executed successfully in a given case. For example, the agent learned to predict which behavior instance (pick and place, push, pull, push with fingertips or turn), corresponding to the abstract operator $\mathsf {move}$, should be used in a given situation to move an object successfully. The preconditions and effects are learned and represented in a CS framework as described in Sect. 3.1. Behaviors with similar effects on an object are clustered together to form abstract operators which are used during the planning process. During execution, the context (i.e. the state before an action is executed) is compared to previous execution runs (previous contexts) and the behavior which is predicted to have the highest success rate is instantiated and performed. In the case of failures at execution time, the actions are substituted by instances from the same cluster (those with the next-highest success prediction). This work was evaluated in the OpenRAVE [10] simulator.

Our formalization of conceptual spaces is based on vector spaces to measure the similarity of actions, contexts, and outcomes, as in [43]. To complete the approach, a method for measuring the similarity between objects is required and work in progress. Therefore, the case study presented in Sect. 5 only exploits functional affordances and categories.

5 Evaluation

In this section we demonstrate that our approach allows our robot, Jenny, to plan for and carry out the tea-making task in our lab under various scenarios which would result in plan generation or execution failures using traditional approaches.

5.1 Setup of the Case Study

The robot platform used is a Care-O-bot 3 robot [18], an omni-wheeled platform with a 7-degrees-of-freedom manipulator and a three-fingered gripper, running ROS [42]. The lab environment consists of a living room, dining room and a fully-equipped and functional kitchen. The use of such a real-world environment, with many instances of the various kitchen objects, such as mugs and forks, serves to demonstrate that the approach can indeed handle real-world scenarios. The layout of the kitchen is shown in Fig. 5.

In the lab, teacups and mugs belong on $\mathsf {shelf2\text {-}1}$ in $\mathsf {cupboard2}$, teabags are in $\mathsf {box1}$ on $\mathsf {shelf1\text {-}1}$ in $\mathsf {cupboard1}$, $\mathsf {kettle1}$ is on $\mathsf {kettleBase}$ on $\mathsf {counter1}$, and the water can be filled from the tap in the kitchen sink. While this task in particular calls for water to be filled from the tap, this is avoided on the real robot due to safety reasons. Temporal issues are not dealt with in our work, so the filling of objects from the sink, or the pouring of the water from the kettle into a teacup for example, are simply executed using a counter. This does not influence the case study’s demonstration of our approach.

5.2 Making Tea

In the best case scenario, the task can be successfully decomposed and a plan, such as that shown in Listing 1 is generated. For the sake of simplicity, both $\mathsf {teacup2}$ and $\mathsf {peppermintTeabag}$ are asserted as being on $\mathsf {counter1}$. Otherwise, the navigation component would have driven to the cupboards where they are asserted to be stored and additional actions such as opening and closing the cupboard doors would have been included in the plan.

Note that the navigation-related operators also refer to the objects, e.g. $\mathsf {!goto\;kettle1\;ForGrasping}$, and do not explicitly refer to the location of the object. The reason for this is two-fold. First, planning is about deciding what the agent should do next: going to the kettle (as opposed to $\mathsf {counter1}$, or $\mathsf {kitchen}$) is an explicit description of what to do. Moreover, it is a more natural way of conveying where the agent is going when communicating the plan to a human being. Second, the agent’s navigation component is able to look up both its current location and that of the object in its own map.

The action $\mathsf {!goto\;kettle1\;ForGrasping}$ involves the navigation component looking up the pose of the object (i.e. where the kettle belongs—$\mathsf {counter2}$), calculating a desired pose to reach in order to grasp the kettle, plotting a path to it from the robot’s current location (possibly a pose in the living room) and driving there.

The $\mathsf {!access}$ operator allows the most up-to-date information to be used by both the navigation and manipulation components. This is because the implementation includes a perception action to locate, confirm and update the exact pose of an object at execution time, and to make any fine adjustments to the location of the robot based on this update.

As mentioned in Sect. 4.3, routine tasks such as making a particular user a cup of tea may involve a slightly different decomposition of the tea-making task. Let us assume for the scenarios below, that we are attempting to make a cup of tea for a user: $\mathsf {Iman}$ who likes her tea made in her favorite teacup $\mathsf {imansTeacup}$. The task is decomposed by a method $\mathsf {MakingTeaForIman}$. Table 1 summarizes the various cases used in the evaluation.

Table 1 Summary of cases in the study

Full size table

Missing predicates that serve as typing assertions (e.g.$\mathsf {kettle\;kettle1}$) and those that refer to the state of objects (e.g. $\mathsf {clean\;imansTeacup}$) can result in failure to generate plans (see Fig. 3). In the first scenarios (A and B), we omit these predicates from the description of the initial state. The planner communicates these unmet preconditions in its explanation of the failure and the control module is triggered to create a placeholder instance and the relevant predicates which are necessary (e.g. $\mathsf {clean\;imansTeacup}$) in the form of updates to the ABox. The problem generator then creates a new planning problem with the expanded domain which includes the placeholder object. Using our placeholder instances, a plan is generated and its execution attempted.

In cases A and B ‘incomplete knowledge’ is the problem. In Case C, however, a certain precondition is not met: $\mathsf {imansTeacup}$ is known to be dirty (i.e. not clean). A placeholder object is therefore of no help and lifting is needed. As the current resource is an instance of an object, the system climbs the flexibility ladder to the next step up and attempts to use any common instance of the object class: i.e. any $\mathsf {Teacup}$. If this fails, the climb continues iteratively.

Cases D, E and F reflect common problems that result in failures at execution time. They serve to demonstrate how our approach handles these problems. In Scenarios D and E, the monitoring process detects an error executing the $\mathsf {access}$ action which should detect the object and its pose. This is such a common error that in our system, a number of retries are attempted (including to widen the area around the expected location) before conceding that it has indeed failed and aborting the current plan. In both cases, the agent first communicates the status to the user and confirms their approval to find substitutes, and replan in order to carry out the task with a new object.

Case F reflects the case when a purposeful action has failed (e.g. $\mathsf {open\;kettle1}$). Once again, the user is informed, and an alternate behavior to achieve the action may be attempted (e.g. pushing the cover to open it instead of pulling).

For our experiments, we consider that our kitchen includes the items specified in Fig. 6. As the representation and calculation of conceptual similarity is work in progress, the case study shows the process of climbing the flexibility ladder to levels one and three (common instance and functional affordances). Figure 2 provides examples of what substitutions might be made for $\mathsf {my\_teacup}$ at each level of the ladder.

5.3 Analysis of the Results

The case study presented so far demonstrates the feasibility and viability of our approach. An interesting question is how our approach fares in comparison with others and whether it is possible to supply experimental results and a quantitative assessment of its benefits and limitations. This is not easy to do, and is still ongoing and future work.

With regards to common performance metrics used in the planning community, like plan length or plan costs, our approach shares the properties of JSHOP2 as the planning process itself has not been modified. Rather, our approach simply amends the planning problem being given to the planner. As there is the added computation to generate a more focused planning problem, the planner is usually able to produce plans faster. Our experience so far suggests that the savings made when planning outweigh the reasoning effort required to generate the focused planning problems. Demonstrating this in experiments remains as future work. However, a number of experiments in [20] support this.

An interesting criterion is whether the plans produced by our approach are more robust than those generated by the JSHOP2 planner without making use of a domain which is created using the placeholders and the lifting feature. In this respect, overlap [47] between an optimal plan produced using the original domain (without any substitutions) and the generated plans using the approaches presented here might be an interesting metric.

The primary performance evaluation metric of relevance to us is coverage. The evaluation shows precisely this: our approach is able to produce plans for problems that other planners, which do not make use of placeholders, lifting, and/or functional affordances, simply can not find solutions for. This metric highlights the benefits of our approach, but is difficult to assess quantitatively, in particular as it depends on the state of the world. We give a qualitative account of it in the following paragraphs.

The cases where planning fails (A–C) would generally result in no plans being generated. Our approach is able to generate plans by using placeholders in these situations. Cases D and E where the failure occurs during acting, would result in no plans being generated to remedy the failure without using the lifting feature. The results below show that the number of successfully generated plans in our case study is greater than those produced without our approach when faced with missing objects. It follows that the number of successfully carried out tasks may also be greater given that no plans at all were generated using traditional approaches and the original problem definitions. Table 2 gives a summary of the number of solution plans generated in the various scenarios. The number of solutions is due to the number of instances with which the plans may be grounded.

Table 2 Number of solution plans for the various scenarios in the case study

Full size table

Case F is a special case in that it deals with action substitution as opposed to object substitution. While we briefly covered this topic in Sect. 4.3.1, it is beyond the scope of the work presented here.

In Case A, the kettle is missing during planning time. With a placeholder instance created, the planning process can proceed. It could be that during the acting phase, the kettle is not found (Case E). While it is possible to instead climb the flexibility ladder, it is not desirable as there still exists the possibility that a plan derived from the original user-specified domain may still be executable. This always represent the most-likely plan to be successfully carried out as the domain modeler expects. For this reason, the placeholder feature is used during planning in response to incomplete information.

The behavior specified in Table 1 for Case E is carried out. In our domain, we specified the existence of only one instance of class $\mathsf {Kettle}$. Lifting over class would therefore yield no additional substitutes. Lifting over functional affordance however would yield five pans in which water may be boiled in since both $\mathsf {Kettle}$ (“a vessel, usually made of metal and with a handle, used for boiling liquids or cooking foods” [33]) and $\mathsf {Pan}$ (“a container made of metal and used for cooking food in”[33]) have the property $\mathsf {isUsedFor\,some\,Cooking}$. Moreover, in the expansion of the domain, methods specific to filling a pan with water and boiling water in a pan are included and used in place of those for kettle.

Case B called for a particular instance of a class to be used but a precondition was not met due to incomplete information (demonstrated by the absence of both $\mathsf {clean\; imansTeacup}$ and $\mathsf {not clean\; imansTeacup}$). Due to the Open World Assumption, this missing predicate does not preclude the teacup’s being clean. The matter is therefore handled by least commitment where the missing predicate (as needed by a precondition) is created, a plan generated and the cleanliness of the object checked explicitly during the acting phase.

In Case C, the state of $\mathsf {imansTeacup}$ is known to be $\mathsf {not clean}$ and for this reason the planning process fails. A placeholder is only used when there is incomplete information. Here, the case therefore necessitates the use of the lifting feature. Lifting to level one of the flexibility ladder to use any common instance of the object yields one possibility: $\mathsf {gerhardsTeacup}$ which is known to be clean and empty, i.e. it satisfies the preconditions (see Fig. 6). The 15 solution plans shown in Table 2 for this case (and for Case D) represent all 15 clean and empty instances of the class $\mathsf {Cup}$. All types of cups share the same functional affordance: they are used for drinking from. In the case where no instance satisfies the preconditions, and some instances have unknown states, all of them would be candidates for the substitution (they are treated as placeholders).

Case D describes the scenario where the $\mathsf {!access}$ operator fails to detect $\mathsf {imansTeacup}$ during the acting phase, making it unavailable. The case can therefore equally represent the scenario where this check for cleanliness fails during the acting phase. Using lifting over class would provide at least one instance—$\mathsf {gerhardsTeacup}$ which has been asserted to be clean in the KB, and up to 15 possible instances when lifting via the functional affordance of $\mathsf {imansTeacup}$. These 15 instances represent the subset of all 50 instances of $\mathsf {Cup}$ that are clean and empty.

6 Discussion

As mentioned above, the focus of our work is to enable agents to handle unexpected situations more robustly by substituting objects as humans do. Traditionally, researchers have focused on learning the link between objects and actions, for example, by analyzing the link between object attributes and the actions they afford (as in [22, 56, 63]), through experimentation (e.g. [30, 46, 58, 62]), or by imitation/demonstration/action recognition (e.g. [9, 35]).

These approaches involve time-consuming processes, and a number of them may yield affordances which lead to the suboptimal use of objects. Moreover, they may neither be goal-oriented nor socially acceptable, as they ignore context/situation. Simply linking object attributes with the actions that they afford does not provide us with socially-expected or acceptable behavior. For example, the affordance rollable is often given to cylindrical parts, although rolling bottles is not a socially-expected behavior. Knowing that concave objects are fillable is obviously useful, but simply knowing that a spoon may be filled, does not make the task of watering a plant with one socially expected (or acceptable). Learning affordances by experimenting can confirm or repudiate the existence of an affordance, but says nothing about the affordance leading to socially-compliant behavior. That is not to say that experimentation is not an important part of development. It lets agents understand their own body’s movements and by so doing, facilitates learning by imitation [13].

Learning from demonstration is more suitable for socializing robots. The ongoing research in the field of action recognition is highly relevant. It lays the foundation, not just for equipping agents to learn how to perform new tasks by watching humans perform them, but for the more socially-complex ability of anticipating the actions of humans (as in [29, 38, 61]). Proactive behavior, in response to anticipating other agents’ actions, can be seen as opportunistic behavior. This research also facilitates the socialization process directly as the substitutes that humans themselves make in different situations can be learned by the agent.

The work carried out in the natural language understanding field is also of great importance as it would make available to the agent the vast quantity of written, as well as audio and video resources, in addition to facilitating communication between the lay user and artificial agents. Moreover, the study of language is inherently intertwined with our own understanding of ‘planned action’ [57]. Language describing human-object interactions [39] is already a focus.

Language is but part of the human-robot interaction process. Another vital component of social intelligence [41] is having agents pick up social cues, e.g. to recognize humans’ emotional states in order to respond effectively. This requires multimodal interaction such as gesture, gaze, head movements, vocal features, posture, proxemics and touch [41]. Together, such building blocks would allow more sophisticated behavior to emerge, such as perspective taking [49].

Systems that learn users’ personal preferences through repeated interactions with them (such as [32]) and manage various profiles are necessary. Much work already exists—our own online profiles and preferences are tracked and managed. We even actively facilitate this process via social media. Investigating the application of these methods to our human-robot interaction scenarios would be beneficial.

Given the stated goal of placing service robots in domestic environments, it is surprising that marketing professionals, whose job it often is to place consumer goods in domestic environments, have not played a larger role. It is their job to know the consumer well, to be able to recognize the various target groups and to know what each group expects and would accept. It would seem that they are best placed to identify which out-of-the-box capabilities/features these groups would expect.

The importance of context has been emphasized throughout this work; however, the means by which to represent situations and context remain a focus of research. For the moment, our approach simply represents context implicitly with the plan library as the initial state of the world. The work of projects such as RACE [48] is therefore of vital importance as they tackle the complex issues of representing whole experiences (including context) and learning from them.

A more flexible representation of context should allow the agent to represent when, or rather when not to attempt a substitution. Since the context in which the task is to be accomplished may impact the desirability of making certain substitutions, for example the presence of guests, and although the user is given the opportunity to direct the agent not to make a substitution, the agent should be able to represent this decision and use it in the future. Representing the other case when certain instances of objects are simply not substitutable, e.g. the key to a room may not be substituted by the key to a different room is would also improve performance and usability over time.

Researchers in the inter-disciplinary field of normative multi-agent systems (nMAS) have reaffirmed the importance of norms being contextual [1] and have investigated both computational models of norms, and architectures which support their use, among others. The violation of norms, and the consequences of doing so, are a major theme within the field. In this work, this topic is not dealt with, nor do we take into consideration the concepts of obligation, prohibition, deadline, or role for example. The scope of our use of norms remains limited to the substitution behavior.

7 Conclusions

In this work, we have stressed the need for domestic service robots to acquire, assimilate and apply the social norms of the group with which they interact so that they can behave in socially-expected and accepted ways. We argued for the role that functional affordances should play in socializing artificial agents and discussed both the advantages of the approach and the limitations. We highlighted the benefits that come from enabling agents to make substitutions; and demonstrated how the use of functional affordances, conceptual similarity and spatial proximity allows agents to do so. We have capitalized on the synergies that result from the inclusion of affordances at various levels within our system: their representation which allows the agent to reason with them, their use in planning, in acting and their use in learning the preferences of those within their group.

Used in planning, we showed how some cases of plan generation failure can be avoided through the use of placeholder instances while other cases require the creation of lifted plans where the missing objects can either be substituted by any instance of the same class or by other objects with the same functional affordances. Moreover, we proposed a methodology that expands the set of possible substitutes by climbing the “flexibility ladder” resulting in socially-accepted behavior.

We showed how the use of functional affordances at execution time would allow the agent to take advantage of opportunities and how the functional affordances themselves can be modified and extended through interaction with the humans in the agent’s group.

The results of the case study support our design choices and highlight the advantages of the approach. Where other agents may simply fail to generate a plan or to carry out a task when faced with missing or unavailable objects, our agent’s approach creates opportunities to get the job done.

In our ongoing research, we are investigating the representation of objects in conceptual spaces. In addition, we are extending the OWL-DL representation of the planning domain to allow a model-to-model transformation into JSHOP2 syntax.

Functional affordances are about use—both by the human and potentially by the robot. To date, few domestic service robots physically carry out tasks that go beyond pick-and-place scenarios, be it serving drinks or fetching objects. While scenarios such as these and others, have been challenging and continue to spur robotics research forward in leaps and bounds, the use of devices to carry out tasks around the house will allow the real power of functional affordances to be leveraged. When our robots can stir, fill, microwave, and do laundry, i.e. actually use the objects themselves, new scenarios will arise where the use of functional affordances may lead to fascinating emergent behaviors. A robot, with the relevant capabilities, may attempt to hang clothes out to dry on the line on hangers, instead of with pegs. It will show true knowledge transfer and adaptability—autonomously finding a new way to carry out a known task. In doing so, it will have learned a new way to decompose a task and added it to its ever-growing domain.

Notes

The JSHOP2 planner allows alternative method decompositions to be specified as branches. Each branch has a set of preconditions which need to be met for a particular decomposition to be applicable. This is a convenient shortcut which results in an ‘if, elseif’ structure.

References

Andrighetto G, Governatori G, Noriega P, van der Torre LWN (eds.) (2013) Normative Multi-Agent Systems. Dagstuhl Follow-Ups, Dagstuhl Follow-Ups, vol. 4. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany
Arkin RC (1998) Behavior-Based Robotics. Intelligent Robots and Autonomous Agents. MIT-Press, Cambridge, MA, USA
Google Scholar
Awaad I, Kraetzschmar GK, Hertzberg J (2013) Affordance-based reasoning in robot task planning. In: Planning and Robotics (PlanRob) Workshop at 23rd International Conference on Automated Planning and Scheduling (ICAPS)
Awaad I, Kraetzschmar GK, Hertzberg J (2013) Socializing robots: The role of functional affordances. In: International Workshop on Developmental Social Robotics (DevSoR): Reasoning about Human, Perspective, Affordances and Effort for Socially Situated Robots at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Awaad I, Kraetzschmar GK, Hertzberg J (2014) Finding ways to get the job done: An affordance-based approach. In: Proceedings of the 24th International Conference on Planning and Scheduling (ICAPS)
Beetz M, Hertzberg J, Ghallab M, Pollack ME (eds.) (2002) Advances in Plan-Based Control of Robotic Agents, International Seminar, Dagstuhl Castle, Germany, Lecture Notes in Computer Science, vol. 2466. Springer
Bradshaw JM, Feltovich PJ, Johnson M (2011) The handbook of human-machine interaction: a human-centered design approach, chap. 13. Farnham, Surrey, England; Burlington, VT: Ashgate, pp. 283–300
Cycorp: OpenCyc. Online at http://www.opencyc.org/ (http://www.opencyc.org/)
Delaitre V, Sivic J, Laptev I (2011) Learning person-object interactions for action recognition in still images. In: Advances in Neural Information Processing Systems (NIPS)
Diankov R (2010) Automated Construction of Robotic Manipulation Programs. Ph.D. thesis, Carnegie Mellon University, Robotics Institute
Erol K, Hendler J, Nau DS (1994) HTN planning: Complexity and expressivity. In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94). AAAI Press, pp. 1123–1128
Field T (2011) SMACH documentation. Online at http://www.ros.org/wiki/smach/Documentation
Fitzpatrick P, Metta G, Natale L, Rao S, Sandini G (2003) Learning about objects through action - initial steps towards artificial cognition. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 3140–3145
Fritz G, Paletta L, Dorffner G, Breithaupt R, Rome E (2006) Learning predictive features in affordance based robotic perception systems. In: Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pp. 3642–3647
Gärdenfors P (2004) How to Make the Semantic Web More Semantic. In: Proceedings of the Third International Conference on Formal Ontology in Information Systems (FOIS 2004), pp. 17–34
Gärdenfors P, Warglien M (2012) Using Conceptual Spaces to Model Actions and Events. Journal of Semantics 29(4):487–519
Article MATH Google Scholar
Gibson JJ (1979) The ecological approach to visual perception. Houghton Mifflin, Boston
Google Scholar
Graf B, Reiser U, Hägele M, Mauz K, Klein P (2009) Robotic Home Assistant Care-O-bot 3 - Product Vision and Innovation Platform. In: Advanced Robotics and its Social Impacts (ARSO), 2009 IEEE Workshop on, pp. 139–144 doi:10.1109/ARSO.2009.5587059
Group AR Atlas transformation language. Online at http://www.eclipse.org/atl/
Hartanto R (ed) (2011) A Hybrid Deliberative Layer for Robotic Agents: Fusing DL Reasoning with HTN Planning in Autonomous Robots. Springer-Verlag, Berlin, Heidelberg
Google Scholar
Hartson HR (2003) Cognitive, physical, sensory, and functional affordances in interaction design. Behaviour & IT 22(5):315–338
Google Scholar
Hermans T, Rehg JM, Bobick A (2011) Affordance prediction via learned object attributes. In: Workshop on Semantic Perception, Mapping, and Exploration at the IEEE International Conference on Robotics and Automation (ICRA)
Höller D (2013) Affordance-based action abstraction in robot planning. Master’s thesis, Bonn-Rhein-Sieg University of Applied Sciences
Holt JC (1964) How Children Fail. Pitman
Hubel N, Mohanarajah G, van de Molengraft R, Waibel M, D’Andrea R (2010) RoboEarth Project. Online at http://www.RoboEarth.org
Hutchins E (1995) Cognition in the wild. MIT Press, Cambridge, MA
Google Scholar
Ilghami O, Nau DS (2003) A General Approach to Synthesize Problem-Specific Planners. Tech. Rep. CS-TR-4597, UMIACS-TR-2004-40, University of Maryland
Janowicz K, Raubal M (2007) Affordance-based similarity measurement for entity types. In: Winter S, Duckham M, Kulik L, Kuipers B (eds) Spatial Information Theory, vol 4736. Springer-Verlag, Berlin Heidelberg, pp 133–151
Chapter Google Scholar
Koppula HS, Saxena A (2013) Anticipating human activities using object affordances for reactive robotic response. In: Proceedings of Robotics: Science and Systems (RSS)
Kraft D, Detry R, Pugeault N, Baseski E, Piater JH, Krüger N (2009) Learning objects and grasp affordances through autonomous exploration. In: Fritz M, Schiele B, Piater JH (eds) Computer Vision Systems, Lecture Notes in Computer Science, vol. 5815, Springer, pp. 235–244
Levihn M, Kaelbling LP, Lozano-Perez T, Stilman M (2013) Foresight and reconsideration in hierarchical planning and execution. In: Workshop on Cognitive Assistive Systems at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Mason M, Lopes MC (2011) Robot self-initiative and personalization by learning through repeated interactions. In: Proceedings of the 6th International Conference on Human-robot Interaction, HRI ’11. ACM, New York, NY, USA, pp 433–440
McKean E (ed) (2005) The New Oxford American Dictionary. Oxford University Press,
Meneguzzi F, De Silva L (2015) Planning in bdi agents: a survey of the integration of planning algorithms and agent reasoning. The Knowledge Engineering Review 30:1–44
Article Google Scholar
Moldovan B, Otterlo MV, Lopez PM, Santos-Victor J, Raedt LD (2011) Statistical relational learning of object affordances for robotic manipulation. In: ILP
Nelson DGK (1999) Attention to functional properties in toddlers’ naming and problem-solving. Cognitive Development 14(1):77–100
Article Google Scholar
Norman D (2002) The psychology of everyday things. Basic Books, New York
Google Scholar
Pandey AK (2012) Towards socially intelligent robots in human centered environment. Ph.D. thesis, University of Toulouse
Patel M, Ek CH, Kyriazis N, Argyros A, Valls Miro J, Kragic D (2013) Language for learning complex human-object interactions. In: IEEE International Conference on Robotics and Automation (ICRA)
Peter Bonasso R, James Firby R, Gat E, Kortenkamp D, Miller DP, Slack MG (1997) Experiences with an architecture for intelligent, reactive agents. Journal of Experimental & Theoretical Artificial Intelligence 9(2–3):237–256
Article Google Scholar
Poggi I, D’Errico F (2011) Social signals: A psychological perspective. In: Computer Analysis of Human Behavior. Springer, pp. 185–225
Quigley M, Conley K, Gerkey B, Faust J, Foote TB, Leibs J, Wheeler R, Ng AY (2009) ROS: an open-source robot operating system. In: Workshop on Open Source Software at the IEEE International Conference on Robotics and Automation (ICRA)
Raubal M (2004) Formalizing conceptual spaces. In: Varzi A, Vieu L (eds) Proceedings of the 3rd International Conference on Formal Ontology in Information Systems (FOIS 2004). Torino, Italy, pp 153–164
Raubal M, Moratz R (2008) A functional model for affordance-based agents. In: Rome E, Hertzberg J, Dorffner G (eds) Towards Affordance-Based Robot Control, Lecture Notes in Computer Science, vol 4760. Springer-Verlag, Berlin, Heidelberg, pp 91–105
Chapter Google Scholar
Reno RR, Cialdini RB, Kallgren CA (1993) The transsituational influence of social norms. Journal of Personality and Social Psychology 64
Ridge B, Skocaj D, Leonardis A (2010) Self-supervised cross-modal online learning of basic object affordances for developmental robotic systems. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 5047–5054
Roberts M, Howe A, Ray I (2014) Evaluating diversity in classical planning. In: Proccedings of the 24th International Conference on Planning and Scheduling (ICAPS)
Rockel S, Neumann B, Zhang J, Dubba K, Cohn A, Konecny S, Mansouri M, Pecora F, Saffiotti A, Günther M, Stock S, Hertzberg J, Tome A, Pinho A, Lopes LS, von Riegen S, Hotz L (2013) An ontology-based multi-level robot architecture for learning from experiences. In: Designing intelligent robots: reintegrating AI II
Ros R, Lemaignan S, Sisbot EA, Alami R, Steinwender J, Hamann K, Warneken F (2010) Which one? grounding the referent based on efficient human-robot interaction. In: 19th IEEE International Symposium in Robot and Human Interactive Communication
Russell S, Norvig P (2003) Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall,
Sapir E (1921) Language: An introduction to the study of speech. Harcourt, Brace and company, New York
Google Scholar
Schneider S (2013) Design of a declarative language for task-oriented grasping and tool-use with dextrous robotic hands. Master’s thesis, Bonn-Rhein-Sieg University of Applied Sciences, St. Augustin, Germany
Severi P, Fiadeiro J, Ekserdjian D (2011) Guiding the representation of n-ary relations in ontologies through aggregation, generalization and participation. Web Semantics: Science, Services and Agents on the World Wide Web 9(2)
Shpieva E, Awaad I (2013) Integrating the planning, execution and monitoring systems for a domestic service robot. In: Workshop on Roboterkontrollarchitekturen at Informatik
Sirin E, Parsia B (2007) SPARQL-DL: SPARQL Query for OWL-DL. In: Proceedings of the Third International Workshop on OWL: Experiences and Directions (OWLED ’07)
Stark M, Lies P, Zillich M, Wyatt J, Schiele B (2008) Functional object class detection based on learned affordance cues. 6th International Conference on Computer Vision Systems (ICVS), vol 5008. Springer, Berlin / Heidelberg, Santorini, Greece, pp 435–444
Steedman M (2002) Plans, affordances, and combinatory grammar. Linguistics and Philosophy 25
Sun J (2008) Object categorization for affordance prediction. Ph.D. thesis, Georgia Institute of Technology
Tenorth M, Beetz M (2009) KnowRob - knowledge processing for autonomous personal robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4261–4266
The Eclipse Foundation: Eclipse Modeling Framework Project Core. Online at http://www.eclipse.org/modeling/emf/?project=emf (2013)
Ugur E, Sahin E, Oztop E (2009) Predicting future object states using learned affordances. In: ISCIS, pp. 415–419
Ugur E, Sahin E, Oztop E (2011) Unsupervised learning of object affordances for planning in a mobile manipulation platform. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 4312–4317
Varadarajan K, Vincze M (2011) Object part segmentation and classification in range images for grasping. In: 15th International Conference on Advanced Robotics (ICAR), pp. 21–27
Zhang J, Patel VL (2006) Distributed cognition, representation, and affordance. Cognition and Pragmatics 14(2):333–341
Article Google Scholar

Download references

Acknowledgments

The authors thank Elizaveta Shpieva, Christian Tiefenau, Daniel Höller and Sven Schneider for their help in implementing some of the ideas presented here. The authors also thank Sven Schneider and Anastassia Küstenmacher for the many useful discussions. Parts of this publication have been previously published in [4, 5]. Iman Awaad gratefully acknowledges financial support provided by a PhD scholarship from the Graduate Institute of Bonn-Rhein-Sieg University.

Author information

Authors and Affiliations

Bonn-Rhein-Sieg University, Grantham-Allee 20, 53757, Sankt Augustin, Germany
Iman Awaad & Gerhard K. Kraetzschmar
Osnabrück University and DFKI RIC Osnabrück Branch, Albrechtstrae 28, 49076, Osnabrück, Germany
Joachim Hertzberg

Authors

Iman Awaad
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard K. Kraetzschmar
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Hertzberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iman Awaad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Awaad, I., Kraetzschmar, G.K. & Hertzberg, J. The Role of Functional Affordances in Socializing Robots. Int J of Soc Robotics 7, 421–438 (2015). https://doi.org/10.1007/s12369-015-0281-3

Download citation

Accepted: 13 January 2015
Published: 27 March 2015
Issue Date: August 2015
DOI: https://doi.org/10.1007/s12369-015-0281-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Role of Functional Affordances in Socializing Robots

Abstract

Similar content being viewed by others

An Affordance-Based Conceptual Framework for Spatial Behavior of Social Robots

An Extended Framework for Characterizing Social Robots

The Secret Life of Robots: Perspectives and Challenges for Robot’s Behaviours During Non-interactive Tasks

1 Introduction