1 Introduction

The introduction of social robots—defined as “autonomous or semi-autonomous robots that interact and communicate with humans by following the behavioural norms expected by the people with whom the robot is intended to interact” [5, p. 1]—is deeply affecting our society in different fields of human life. The robots used in industrial processes, for instance, are modifying job roles and rules ([1, 50, 58]; see also [26]). Prospectively, the introduction of robots in other contexts, such as households, education, care assistance, manufacturing, etc., will also presumably have an increasingly significant effect on human activities and roles. In this perspective, several researchers are working towards the implementation of robots in the attempt to make them behaviourally and physically similar to the human being [2, 25, 38, 84]. The humanoid robots—the ultimate realization of these attempts—are characterized by human-like physical features, behaviours, and cognitive processes, which call to mind the idea of “they are more like-me” [68].

The implementation of human-like artificial agents is also aimed at supporting scientific research in different fields, such as anthropology, psychology, ethics, sociology, engineering, informatics, mathematics, physics, etc. Referring to psychology, Scassellati [87, 88], for example, suggests that the use of human-like robots is beneficial to the study of typical and atypical human social development, allowing to test cognitive, behavioural and developmental psychological models. In fact, employment of human-like robots may serve both as “stimuli” to study the human cognition by evoking social cognitive processes [101], as well as “agents” the humans can observe and interact with [103]. In this respect, Wykowska and colleagues [103] showed that—by combining experimental control and ecological validity—manipulation of various behavioral and physical parameters in humanoid robots during a robot-human interaction may provide insightful information concerning the social cognitive processes in the human brain. It was shown, for example, that, while low-level perceptual mechanisms are similarly activated when both artificial and natural agents are observed, activation of high order social cognitive mechanisms requires the artificial agents to be endowed of human-like features, which allow the emulation—in the human—of human-like behaviours.

By contributing to the understanding and identification of the precursors underlying human sociality and their ontogenesis, knowledge from developmental psychology is also crucial to the implementation of psychological models to be used in the human–robot interaction. Growing recognition of the role of developmental psychology for the understanding of the human–robot interaction is evidenced by the substantial increase—in the last decades—of researches focussing on the child-robot interaction (e.g., [7, 11, 12, 47, 48, 61, 71, 72, 77]). The results of these studies consistently show that children tend to attribute mental states to humanoid robots, treating the robots as human agents (for review, see [64]). Connecting developmental psychology to the study of the human–robot interaction from a multidisciplinary perspective, Itakura [42] proposed a new research domain named Developmental Cybernetics (DC; [42]; see also [45, 46, 76, 77]). Developmental Cybernetics explores the child-robot interaction through the construction of theoretical frameworks that characterize the design of the robot with the aim to facilitate these interactions [42, 46]. It focuses on three abilities, which are critical for making a robot a social agent: theory of communication (ToC; [43, 44, 76]), theory of body (ToB; [70, 72]), and theory of mind (ToM; [42, 45, 46]). With particular focus on the theory of mind, several contributions have now demonstrated its significance for the development of social competences: through this human psychological ability it is possible to understand one’s own and other people’s mental states (intentions, emotions, desires, beliefs), allowing to predict and interpret one’s own and others’ behaviours on the basis of such meta-representations ([21, 59, 80, 82, 102], for a review, see [98]). There are different behaviours and cognitive processes that are linked to ToM: imitation [4, 8, 9, 83]; joint attention, pointing, gaze-following [13, 14, 23, 67, 68, 78]; and intentionality understanding [4, 20, 30, 36]. Several studies have investigated the effect of these precursors on the interaction between humans and robots [22, 29, 40, 41, 49, 51, 69, 71,72,73, 78, 79, 90, 99]. For example, by examining the influence of an online eye-contact of a humanoid robot on humans’ reception of the robot—as indicated by self-reports—Kompatsiari and colleagues [53] showed that eye-contact facilitates the attribution of human-like characteristics to the robot. People, in fact, were sensitive to the mutual gaze of the artificial agent, feeling more engaged with the robot when a mutual gaze was established. Also, Okumura and colleagues [77] investigated the effect of the referential nature of gaze on the acquisition of information, comparing—through eye-tracking—human gaze with nonhuman gaze (robot) in 10- and 12-months-old infants. The findings showed that infants followed both human and robot gaze, although only human gaze had an effect on the acquisition of information. In a most recent review of the literature, Wiese and colleagues [101] further argued that designing artificial agents so that they are perceived as intentional agents activates in the human brain areas involved in social-cognitive processing, ultimately increasing the possibility that robots are treated by humans as social partners. By creating artificial agents as intentional partners may indeed lead to the promotion of feelings of social connection, empathy, and prosociality [101]. The effect of intentionality on the human tendency to anthropomorphise things was also shown in very young children. When young children interact with artefacts, these acquire the status of relational artefacts, and are thought of by children as “alive” and with an “intention” [93, 94]. Already Piaget [81] suggested that children younger than 6-years-old tend to attribute consciousness to objects—namely the capability to feel and perceive—and that children consider “alive” also the things that are inanimate for the adult, if these objects serve a function or are used to reach a goal: a phenomenon called animism. In this respect, Katayama et al. [52] showed that, although children are generally able to discriminate between the human and the robot, 5 and 6-years-old children tend attribute to robots biological and psychological proprieties.

Following from these findings and with the aim to specifically explore young children’s behavioural responses and mental states attribution to an artificial agent during an interactive setting, in the present study we investigated preschool children’s behaviour when interacting with a robot with respect to a human agent. Relational skills in social contexts are typically studied trough use of interactive games deriving from the Game Theory [15, 96]. One of such games is the Ultimatum Game [31, 32]. Used with children, the Ultimatum Game (UG) serves to delineate development in terms of equity and negotiation ability in decision-making, thus focusing on sensitivity to fairness and aversion to inequality [27, 28]. The UG involves interacting with at least another agent, and it is related to the development of ToM in that it activates the ability of mental comprehension, detection and anticipation of the other’s behaviour [16,17,18,19, 37, 60, 62, 63, 66, 89, 91, 95]. In the UG, the subject can play as either the proposer or the receiver. The general rule underlying the UG is that the proposer makes an offer, proposing a certain division of goods between him or her and the other player, i.e., the receiver. If the receiver refuses the offer, neither players get anything, whereas if the offer is accepted, both players obtain the proposed division. Using the UG, it is then possible to study interactive behaviour in reciprocal situations from a developmental perspective, comparing behaviour at different ages. Some studies suggest that infants are already responsive to an unequal resource distribution at 19-month-olds (e.g., [89]). Fehr and colleagues [28] showed that sensitivity to fairness increases from 3 to 8 years of age, and that selfish behaviour decreases from 3–4 to 5–6 years, although most of 5–6 year olds still tend to not share (78% of the sample). Additionally, school age children (> 7 years) generally behave more fairly when distributing resources and tend to share more than younger children [28, 39].

Only few studies have employed the UG paradigm with robots, and—even so—uniquely in interaction with adults. Research on adults generally shows that participants are more likely to behave in a similar fashion when playing the UG with another human or a robotic agent, and that this similarity increases the more the robotic agent presents human-like features. Terada and Takeuchi [92], for example, reported that rejection scores in the UG were higher in the case of a computer opponent than in the case of a human or robotic opponent, suggesting that people might treat a robot as a reciprocal partner as when playing with another human. Also, Nishio and colleagues [74] tested university students while playing the UG with four distinctive artificial agents, which differed in the degree of similarity with humans. In their study, there were four conditions: computer terminal (Computer condition), humanoid robot (Humanoid condition), android robot (Android condition), and humans (Human condition). They included a mentalizing stimulus for each agent, which consisted of four short interactional sentences pronounced by each agent when meeting the subject (e.g., “how are things going?”). Furthermore, the authors administered a post-experimental questionnaire in order to test how participants perceived the agents (i.e., human likeness, or machine likeness). The authors examined the number of rejections of the agent’s proposals and the number of the agents’ fair and unfair proposals, both measures used as indicators of recognition of the agents as social entities. The results showed that, when the agent looked like a human, fairness of the proposals and refusal rate were similar as when interacting with a human, and particularly when introducing the artificial agent after the interactional sentences. Additionally, Nitsch and Glassen [75] analysed a group of young adults, who played the UG with a human agent and with a humanoid robot behaving so as to be perceived animated or apathetic. The results showed that the participants that played in the animated robot condition perceived the interactions as more positive and human-like than in the apathetic condition. Robins and colleagues [84] also studied reciprocity in the human–robot interaction in a sample of young adults through the UG and the Prisoner Dilemma. They showed that participants were equally reciprocal with the human and the robot, despite the fact that the robot collaborated more with humans in the Prisoner Dilemma task, indicating that human–human interactions are nevertheless privileged with respect to human–robot interactions. Finally, Takagishi and colleagues [91] examined the effect of different robots’ facial expressions in the UG, and showed that emotional expressions had an effect on the offers made by the participant to the robot compared to offers made to a computer displaying a simple line drawings face.

Altogether, the results of these studies suggest—in adult subjects—a proclivity to interact with robotic agents in a very similar fashion as with other humans. With respect to children’s behaviour, there are no studies so far addressing children’s decision making during an interactive and reciprocal social situation involving a robot. In the present study, we made an attempt to fill this developmental gap by exploring interactive behaviour during the UG when children played with either another child or with a robot. Our sample children were aged 5-to-6 years. At this age, children had acquired at least a first-order ToM capability [98]. First-order ToM entails a recursive thinking, which implies the meta-representation or the representation of a mental representation of a low complexity level, of the kind “I think that you think…”. In this respect, we hypothesized that children aged 5–6 years were able to discriminate the interactive partners as a function of agency (human vs. robot) on the basis of mental states attribution. In the UG, the children played both as proposers and receivers. Their performance on the UG was evaluated against several socio-demographic, as well as, cognitive factors. Finally, an analysis of children’s justifications for their behaviour when playing both as proposers and receivers was carried out, outlining—for the first time—response patterns that characterize the rationale underlying these children’s game strategy.

2 Method

2.1 Participant

Thirty-three (33) Italian kindergarten children participated in the experiment. Two children were excluded from the study: one failed the First-Order ToM task and the other one failed the Inhibition task (see below). Data analysis was therefore carried out on 31 children (males = 18, females = 13; mean age = 70.8 months, SD = 2.99 months). The children attended three different schools located in north Italy. The children’s parents received a written explanation of the procedure of the study, the measurement items, and the materials used, and they gave written consent. The children were not reported by teachers or parents for leaning and/or socio-relational difficulties. The study was approved by the Local Ethic Committee (Università Cattolica del Sacro Cuore, Milan).

2.2 General Procedure and Measures

The children were tested individually in a quiet room in the kindergartens. Administrations were carried out by a single researcher both in the morning and afternoon during normal activity. The children were assessed in two experimental sessions in different days within 2 weeks. The administration of the tasks in two independent sessions was thought of not to overly tire the children with an excessively long procedure as well as not to disrupt children’s normal school activity.

In the first session, children were administered the following tests: A First-Order False-Belief task and two First-Order False-Belief videos, assessing first-order theory of mind; two Strange Stories, assessing advanced theory of mind; and the Attribution of Mental and Physical States (AMPS) scale. In the second session, children were administered the following tests: The Family Affluence Scale (FAS), assessing socioeconomic status; two executive functions tests, assessing the ability to inhibit behaviour and working memory; further two First-Order False-Belief videos; additional two Strange Stories; and the Ultimatum Game, assessing fairness.

The tests were so distributed within sessions to balance the total time required to complete the whole battery, which was of approximately 45–50 min, with each session lasting about 25 min. Within each session, the administration of the tests was randomized across children.

2.3 Socioeconomic Status

To assess children’s socio-economic status, we used the Italian translation of the Family Affluence Scale (FAS; [10, 97]). This test is composed of four items measuring family wealth, and more specifically: (1) family car ownership (score range 0–2); (2) own bedroom (range 0–1); number of computers at home (range 0–3); number of times spent on vacation during the past year (range 0–3). The FAS total score was obtained adding up the four-item scores (total range 0–9).

2.4 Executive Functions (Inhibition and Working Memory)

We tested the children’s executive functions using the Inhibition “A Developmental NEuroPSYchological Assessment” subtest (NEPSY II; [54]) testing the ability to inhibit automatic responses in favour of novel responses and the ability to switch between response types. The subject looks at a series of black and white shapes or arrows and names either the shape or direction or an alternate response, depending on the colour of the shape or arrow. In the present study, we used the combined scores of the Inhibition NEPSY-II subtest, which associates accuracy and speed of response.

Furthermore, we used the Backward Digit Span testing working memory. The experimenter verbally presented the child with sequences of numbers that the child had to repeat in reverse order. The test started with three sequences of two digits, and gradually introduced triples of sequences of increasing length, to a maximum possible length of eight digits. The test stopped when the child failed all three sequences of the same length. The score was calculated as follows: Children received 1 when they reversed at least two out of three sequences; if successful, the child passed to the next longer series, scoring 0.33 reserving at least one out of three sequences. The decimal scores were then summed to obtain the main score (e.g., 1 + 0.33 = 1.33) until children reached 1 (e.g., 1 + 0.33 + 0.33 + 0.33 = 2), if necessary. For a detailed description of the scoring criteria see Korkman et al. [54].

2.5 First-Order False Belief Task

To assess first-order ToM, we used the classical Unexpected Transfer task story [102]. The experimenter told the story using two dolls (one male and one female), a ball, a box, and a basket. The classical story is about two siblings playing with a ball in a room. One of the children puts the ball in a box and leaves the room. Meanwhile, the other child takes the ball out of the box, puts it in the basket and goes away. Finally, the first character comes back in the room and wants to play with the ball. At the end of the story the experimenter asks the child the following questions: “What is the first place where he will look for a teddy bear?”—referring to the first character (First-Order False Belief question); “Where did the child put the ball before going upstairs?” (control memory question); “Where is the ball really?” (Reality control question).

The answers to the two control questions (memory and reality) were used to filter children’s performance: children who did not pass them scored 0 and their performance on the First-Order False Belief question was not evaluated. If children passed the control questions, the test question about the false belief scored 1 if correct and 0 if incorrect.

2.6 First-Order False Belief: Video Stimuli

To assess children on the First-Order False Belief, we introduced—besides the commonly used task as described above—a new tool, namely eight videos representing the classical story, this time introducing Robovie as a new character. The aim of the videos was twofold: On one hand, we aimed at comparing the video version of the false belief task with the classical storyboard narration; on the other hand, by comparing children’s attribution of a first-order false belief to another child and to a robot we aimed at assessing whether beliefs attribution to the robot substantially differs from belief attribution to the child, thus highlighting children’s recognition of different mental properties. To these purposes, we used a story similar to the classical Unexpected Transfer task [102] as videos script. The videos started presenting two characters (two children, two robots, or a child and a robot) in a room with two boxes of a different colour (blue and pink) placed in front of them. Subsequently, one of the characters put a teddy bear under one of the two boxes (the same for all videos) and then left the room. When the other character was alone in the room, he, she, or it moved the teddy bear under the other box and left the room. Finally, the first character returned into the room to take the teddy bear. A voiceover narration described the story in each video.

A frequency analysis showed that all children attributed a false belief to the child independent of whether the task was administered in the form of storyboard or video, suggesting that the storyboard narration is equivalent to the video format in terms of task effect. Additionally, all children attributed a false belief also to the robot, independent of the role played by the robot (i.e., making or undergoing the transfer with a human agent or with another robot), suggesting that the robot, like the human, is subject to informational access limitations. Details about video recordings are in Supplementary Material (S1). The videos can be found in Electronic Supplementary Material.

2.7 Strange Stories

We assessed children’ advanced ToM using the Strange Stories task [33, 100], whose Italian translation had been already used in other studies [21, 55]. This task evaluates the ability to make inferences about mental states by interpreting non-literal statements. We selected a subgroup of four mentalistic stories involving double bluffs, misunderstandings, white lies, and persuasion. After reading the stories, we asked the children to explain the characters’ behaviour without a time limit. The experimenter transcribed ad verbatim the children’s answers. Scoring was based on the general guidelines [100]: 0 for incorrect answers, 1 for partially correct answers, and 2 for full and explicit answers. Two judges independently coded 30% of the responses. The inter-rater agreement was substantial (Cronbach’s Alpha = 0.96). Disagreements were resolved through discussion between judges. Total scores ranged from 0 to 8. The analysis of children’s performance on the strange stories showed a quite poor general performance. In fact, children scored very low, ranging from 0 to 3 out of a total possible score of 8.

2.8 Attribution of Mental and Physical States

The Attribution of Mental and Physical States (AMPS) is a measure of states attribution that the child has to make looking at pictures depicting a character. The AMPS is an ad-hoc scale drawn from the questionnaire described in Martini and colleagues [65]. This scale was used to assess what features the subject (the child) ascribed to the interactive agent (another child or the robot) in mental and physical terms. The experimenter first showed the child two pictures depicting a child or a robot and then asked the child—as a control question—what was depicted in the picture (reality question): If the child correctly recognized the character, the test went on; if the child responded incorrectly, the experimenter corrected the child making sure that he or she was fully aware of the agent’s identity and then proceeded with the test. No children failed the reality question; that is, all of the children correctly recognized the child and the robot at first instance.

The presentation order of the two pictures was randomized across children. After presenting the picture and asking the reality question, the experimenter asked the child 25 questions about the agent’s mental states. The child had to respond “Yes” or “No” to each question. The 25 questions were grouped in five different states categories: Epistemic, Emotional, Desires and Intentions, Imaginative, and Perceptive. The total score was the sum of the “Yes” answers (score range 0–25); the five partial scores were the sum of the “Yes” answer for each category (score range 0–5).

2.9 Ultimatum Game

A standard version of the Ultimatum Game (UG; [32]) was used to evaluate fairness. During the game, the child (playing as proponent) could decide how to distribute 10 stickers between him or her and a passive player represented by a graphic representation of either a child–human agent or a robot. The graphic representation of the other child corresponded to a male figure when the subject was a boy, whereas it represented a female when the subject was a girl. The use of a graphic representation of an interactive partner has been already used with children during an UG in previous studies (e.g., [16, 60]).

The stickers depicted known animals by the children (e.g., dogs, cats, tigers, lions, birds, fishes, etc.). The number of stickers that could be offered during the game ranged between 1 and 9. After playing the UG, the child was actually given a final amount of sticker. Children played one round as proposers and five rounds as responders, for a total of six rounds. Playing as proposer, the child could decide how to divide the stickers with the other child or the robot (graphic representation) and, playing as responder, the child could decide whether to accept or refuse the proposed division. In case of acceptance, both players received the respective proposed amounts; in case of refusal, neither player gained anything.

When playing as responder, each round (5 in total) corresponded to a specific type of proposal, ranging from a division of 5–5 (fair) to a division of 1–9 (highly unfair). The rounds were randomized across children. For each round, the child could accept or refuse the offered amount, receiving a score 1 in case of acceptance and 0 when refusing the offer. The total amount of acceptances—corresponding to the total gain—was also calculated (range 0–15) when the child played as receiver. Similarly, when playing as proposer, the total amount proposed was calculated for each child (range 0–9).

2.9.1 UG Procedure

Before starting the game, the experimenter explained the game rules presenting four different examples in a graphic format: (1) an adult playing as proposer with a child, who accepted the offer; (2) an adult playing as proposer with a child, who refused the offer; (3) a robot playing as proposer with a child, who accepted the offer; (4) a robot playing as proposer with a child, who refused the offer.

During the game, the researcher played “the other child’s/robot’s” role, proposing, accepting, or refusing offers. The children’s performance at the Ultimatum Game was evaluated in a single session lasting approximately 10 min.

At the end of the game, children were asked the reasons that they placed a certain offer (proposer) or accepted/refused a certain proposal (receiver). The children’s answers were classified according to three criteria: (1) outcome—the justification refers uniquely to the amount offered or received without commenting on or referring to the other player in mental terms (e.g., “because I like that he does not have anything and I have them all”; “because I wanted to win”); (2) equity—the justification includes terms that refer explicitly to the construct of equity, independent of whether the amount offered or received was equal or unequal (e.g., “because in this way we are even”; “because it was nicer to give him three and I like to share them”); (3) mentalistic—the justification includes terms that refer to the other player in mental terms (e.g., “because he would like me to receive more”; “he believed that I would win”).Footnote 1

3 Results and Discussion

All of the continuous variables were normally distributed with skewness between − 1 and 1.

3.1 AMPS

With this analysis, we compared the attribution of states to the human agent (the other child) and to the robot. The inspection of the boxplots revealed a tendency towards maximum scores (5) for states attribution to the human agent—particularly with respect to the perceptive state, where we observed a ceiling effect—and a normal distribution for states attribution to the robot. A repeated measures general linear model (GLM) was carried out with five levels of state (epistemic, emotional, desires and intentions, imaginative, perceptive), and two levels of agency (human vs. robot) as the within groups factors. The results showed a main effect of state (F4,120 = 5.94, p < 0.0001, partial-η2= 0.17, δ = 0.98), a main effect of agency (HA > RB; F1,30 = 53.82, p < 0.0001, partial-η2= 0.64, δ = 0.1), and a significant interaction state * agency (F4,120 = 7.90, p < 0.0001, partial-η2= 0.21, δ = 0.1). Post hoc analyses (Bonferroni corrected) showed that desires and intentions scored significantly higher than both epistemic (Mdiff= 0.69, p < 0.01) and imaginative (Mdiff= 0.52, p < 0.05) states; perceptive state scored significantly higher than imaginative state (Mdiff= 0.61, p < 0.01). Additionally, the analyses showed that the interaction effect stemmed primarily from differences between state within agency (HA, RB). More specifically, with respect to states attribution to HA, epistemic state scored higher than imaginative state (Mdiff= 0.58, p < 0.01), and perceptive state scored higher than all the other states (p < 0.05) except for the epistemic, which just failed to reach significance (Mdiff= 0.36, p = 0.055). With respect to states attribution to RB, the results showed that desires and intentions received higher scores compared to epistemic and imaginative states (Mdiff= 1.23, Mdiff= 1.00, respectively; p < 0.001). These results are summarized in Fig. 1.

Fig. 1
figure 1

Mean scores during the attribution of states (epistemic, emotional, desires and intentions, imaginative, and perceptive) to the human agent (HA) and to the robot (RB). The bars represent the standard error of the mean

The results on the attribution of states to HA and RB clearly showed that children perceived the human agent with greater epistemic and emotional states, intentions and desires, imagination, and perceptive properties than the robot, therefore considering this latter as a distinct entity (see, [24]). Children tended to ascribe poor imagination to both the child and the robot, in line with evidence showing that young children’s imagination abilities mature gradually [34,35,36, 56, 57]. Interestingly, children tended to ascribe desires and intentions also to the robot, although to a significantly lesser extent than to the human agent. This finding is congruent with evidence showing young children’s tendency to ascribe intention also to non-living things [81]. This tendency could have then been enhanced in this study because, during ToM videos (see Sect. 2), children observed the robot performing real actions. In this respect, also from a neurophysiological perspective, movement and intentions go hand in hand ([40, 83]; see also, [22, 86]), as also suggested by psychophysiological evidence with children [20]. All the same, while attributing intentions to the robot, children ascribed to the robot a low ability to think, understand, and decide (epistemic attribution). These results, apparently in contrast, are on the other hand much in line with the idea of animism introduced above, and namely with young children’s tendency to attribute a living soul to plants, inanimate objects, and natural phenomena, while being aware that these entities are non-living (see [52, 77, 92]).

3.2 Ultimatum Game

The main aim of the present study was to assess whether preschool children behave differently when interacting with a human agent or with a robot. We therefore had children playing the UG with either another child or with the robot, as both proposers and receivers.

3.2.1 Proposer

A t test comparing the total amount offered to the other child or to the robot when the participant played as proposer showed no significant difference in the total amount offered (HA: M = 3.71, SE = 0.31; RB: M = 3.55, SE = 0.35; p > 0.05). Reinforcing this result, we found high internal consistency (Cronbach α = 0.85) between each individual child’s offer when playing with the human agent or the robot. Table 1 reports children’s proposed divisions to HA and RB.

Table 1 Statistics of children’s offers when playing as proposers

These results are in line with literature (e.g., [28]) showing that pre-schoolers tend to make offers that favour their own gain, although the frequency of children that made offers towards fairness (i.e., ranging between 4–6 and 6–4) was quite substantial (68% towards HA; 55% towards RB). Additionally, these results extend previous findings showing that 5-year-old children’s behaviour and game strategy is consistent independent of whether children play with another child or with a robot, in line with the large body of existing literature on adults, as fully introduced above.

3.2.2 Receiver

Within this analysis, we summed all acceptances when children played as receivers calculating the total amount gained by each child. A t-test comparing the total acceptances from the other child and the robot showed no differences in the total amount accepted as a function of agency (t = 0.75, p > 0.05), indicating that children accepted similar amounts when playing with HA (M = 10.97, SE = 0.77) and with RB (M = 10.45, SE = 0.95). This result is congruent with the data described above, showing children’s tendency to adopt a similar game strategy when playing with the other child or with the robot.

Specifically comparing children’s acceptances of each proposed division from HA and RB, the results confirmed no differences in the frequency of acceptances as a function of agency within division (for statistics, see Table 2). Additionally, comparing the frequency of acceptances between successive divisions for both HA and RB, the Wilcoxon test showed no significant differences for the child (i.e., 5–5 vs. 6–4; 6–4 vs. 7–3; etc.; p > 0.05), whereas we observed significant differences for the robot. More precisely, acceptances of 6–4 division were significantly lower than acceptances of 5–5 division (71% vs. 90%, p < 0.01), as well as acceptances of 7–3 compared to 6–4 divisions (52% vs. 71%, p < 0.01). This latter result shows a significant, although minor, tendency of children to accept slight unfair offers more often from another child than from the robot.

Table 2 Comparison of refusals and acceptances of proposed divisions during the UG when the child played as receiver with another child (HA) and with the robot (RB) (N = 31)

3.3 Correlations

Pearson’s correlations were carried out between the total scores at the UG when children played with the other child and the robot, as both proposer and receiver. The results showed, as expected, a substantial correlation between the total amount proposed to the other child and the robot (r2 = 0.56, p < 0.01), as well as a strong correlation between the total amount accepted by the other child and the robot (r2 = 0.49, p < 0.01). Additionally, UG scores were further correlated with demographic, cognitive, and socio-economic variables, including age (months), the child’s gender, and scores on: working memory, inhibition task, intentionality test, and Family Affluence Scale. These correlations are summarized in Table 3. In general, no significant correlations were observed between the UG scores and any of these variables (p > 0.05).

Table 3 Correlation analysis between the total scores at the Ultimatum Game when the child played as proposer with a robot (1) and the child (2) and when the child played as receiver with the robot (3) and the child (4)

3.4 Analysis of Children’s Justifications at the UG

After the child played the UG as both proposer and receiver, the experimenter asked him or her to explain the reasons for his or her division. The children’s answers were classified as “outcome”, “equity”, or “mentalistic” (for details, see Sect. 2). Two independent judges evaluated the children’s justifications.

3.4.1 Proposer

The inter-rater reliability scores were substantial both when assessing children’s justifications when playing with the other child (Cronbach’s Alpha = 0.88), and when playing with the robot (Cronbach’s Alpha = 0.91).

Analysing the frequencies of children’s justification when making an offer (proposer), the results showed an even distribution of justification type (amount, equity) when children offered a precise fair amount (5–5 division), whereas a skewed distribution of justification type toward “amount” when children started placing offers that privileged their own gain. This was independent of whether children played with the HA or the RB. A summary of the frequency distribution of justification types for each proposed division is presented in Table 4(A).

Table 4 Frequency of children that provided a justification to their acceptance of proposals and acceptances/refusals categorized as (1) outcome—the justification refers uniquely to the amount offered or received without commenting on or referring to the other player in mental terms; (2) equity—the justification includes terms that refer explicitly to the construct of equity, independently of whether the amount offered or received was equal or unequal; (3) mentalistic—the justification includes terms that refer to the other player in mental terms

3.4.2 Receiver

The inter-rater reliability scores were calculated for each proposed division (5–5; 6–4; 7–3; 8–2; 9–1) when children played as receivers. The results showed substantial agreements of children’s justifications of acceptance/refusals when interacting with the other child (mean Cronbach’s Alpha = 0.83; min = 0.82, max = 0.84) and with the robot (mean Cronbach’s Alpha = 0.83; min = 0.80, max = 0.84).

By comparing the frequency of justification types when children accepted or refused the proposed divisions, we found that, when receiving a 5–5 division, justifications were either based on “outcome” (HA: N = 13; RB: N = 14) or on “equity” (HA: N = 18; RB: N = 16); only one child presented a mentalization-based justification when interacting with the robot. When faced with a 6–4 division, the majority of children (HA: N = 26; RB: N = 28) presented a justification based on “outcome” independent of acceptance or refusal; five children presented a justification based on “equity” and none based on “mentalization” when interacting with the other child; only one child presented a justification based on “equity” and two children a justification based on “mentalization” when interacting with the robot. A summary of the frequency distribution and statistics of acceptances and refusals of each proposed division and justification type is presented in Table 4(B). In general, these results suggest that, when children were proposed a division skewing towards unfairness, the equity and mentalization-based justifications decreased substantially (χ2(4)  = 16.47, p < 0.001), in favour of the outcome-based justification.

The analysis of children’s justification for their behaviour—when playing both as proponents and receivers—indicates that children tend to regard fair (and hyperfair) divisions either in terms of outcome or in terms of equity-based reasoning, with an approximately equal split. When the division starts skewing, even slightly (i.e., 6–4 division), towards unfairness, the types of justifications remarkably bias towards an outcome-based reasoning. It is noteworthy observing that mentalistic-based justifications were very poor, if not even inexistent, in all proposed scenarios. This general behaviour was observed both when children played with the other child and with the robot, indicating no differences in the type of reasoning underlying specific choices as a function of agency.

On the whole, the UG results suggest that children aged 5 years behave quite fairly, though showing a tendency towards favouring their own gain placing and accepting unfair—and even very unfair—divisions, in line with current literature describing pre-schoolers’ behaviour during the UG (see, for example, [60]). Lack of behavioural differences as a function of agency (HA vs. RB) further indicates that children at this age tend to disregard to other’s mind when making choices, following a very consistent strategy throughout the game on the basis of other criteria (e.g., tendency to maximise gains), or personal traits (e.g., disposition to fairness). This is particularly true if considering the fact that the robot was indeed considered a different entity from the human agent as observed from the analysis of states attribution described above. This idea of self-centeredness is further supported by the analysis of children’s justifications for their offers (proposer) and acceptances/refusals (receiver). In fact, we found an even distribution of justification types (amount, equity) when children offered or accepted a fair amount both to/from another child and to/from the robot; whereas, we observed a remarkably skewed outcome-based justification when children offered or accepted/refused unfair divisions. These latter results are further discussed below.

4 General Discussion

This study investigated 5-year-olds’ behaviour at the Ultimatum Game. The aim was to compare young children’s game strategy when playing with a human or robotic agent. The results showed that children’s game strategy was very similar when they played with another child and with a robot, independent of the role played (proposer or receiver). Children were quite conservative both when offering and accepting proposals, that is, they generally tended to offer and accept fair divisions (5–5), although almost half of children proposed and accepted also unfair divisions, ranging from 6–4 to 9–1. These results are in line with developmental literature on UG. For example, Benenson and colleagues [6] observed that children aged 4 years donated less than 6 and 9-year-olds, thus showing that sensitivity to fairness increases with age. Furthermore, a research by Lombardi and colleagues [60] showed that, at 6 years, children were more selfish than children aged 8 and 10 years, further highlighting that younger children tend to maximize their gain accepting also unfair offers.

Importantly for the present purpose, no significant differences were observed between offers made to or received from the other child or the robot. In this respect, children’s behaviour was comparable to adults’ behaviour as reported in several studies. Terada and Takeuchi [92], for example, showed that, when playing the UG, adult participants took approximately the same time to decide how to distribute goods between a human and a robot. In the study by Kahn and colleagues [47], the frequency of acceptances and refusals became closer to that toward a human when the robot had more human-like features. Similar results were also found by Rosenthal-von der Pütten and Krämer [85], on the whole suggesting that, when bargaining, strategic thinking does not substantially change if playing with a human or a robotic agent.

Although also in the present study behaviour was comparable when children played with another child or with the robot, it is worth noting the significant, although minor, tendency of children to accept slight unfair offers more often from another child than from the robot when they played as receivers. In this respect, one may suggest that children’s expectancy from a robot is to be a totally fair player. Accordingly, divergence from this initial expectation could be less acceptable compared to situations in which children played with a human peer, whose behaviour could be plausibly expected to be also unfair. Alternatively, but not mutually exclusive, one may suggest that children believe they have a distinctive status compared to robots that, as assessed through AMPS, are perceived as different entities with respect to children at all levels: epistemic, emotional, imaginative, and perceptual. Also, in this light, unfair offers would be less acceptable when made by a robot than by a human peer.

Finally, with respect to results from the analysis of children’s justifications at the UG, it is interesting to observe that the reasons provided by children for their behaviour, both when playing as proposers and receivers, suggest that children aged 5 years tend to consider fair only a division that is exactly fair (5–5). A slight skewed division towards unfairness (i.e., 6–4) already falls within the range of categorization that characterizes very unfair proposals, regardless of whether the proposals are made or received. Additionally, it is noteworthy that fair or slightly hyperfair divisions were typically justified in either outcome or equity terms, whereas unfair offers were mostly justified in terms of quantity. Mentalization-based justifications were poor for all divisions, although—in these cases—it was interesting to observe that, when children played as proposers, mentalization-based justifications were uniquely confined to the fairest divisions (4–5; 5–5; 6–4). In general, these considerations may suggest that children, particularly in case of unfair offers, tend to not use mentalization-based explanations in order to reduce the distress stemming from the discrepancy between a “socially-expected” equal behaviour and the actual “selfish” behaviour. This interpretation, which highlights a moral conflict, can be further extended to the outcome-based justifications predominantly used to explain unfair offers. In fact, using a “cold” justification for morally-inacceptable behaviour may, in a certain sense, serve to resolve a social cognitive conflict (e.g., [3]), ultimately reducing the distress caused by the experimenter’s explicit request to provide a justification.

5 Concluding Remarks and Limitations of the Study

This study allowed to highlight the use of artificial agents as “stimuli” to evoke and analyse human social abilities by investigating the development of psychological processes. In particular, we showed for the first time that young children’s fairness is not affected by the partner’s agency since, as it appears and in line with previous results, 5 years-old children’s decisions during social interactions are self-centred and poorly oriented towards the other’s mind. Future studies could address the same issue focussing on different ages to highlight behavioural developmental changes in older children, when higher mentalistic abilities have matured. In this respect, it is worth noting the importance of considering age in the human–robot interaction since the quality of the interaction with a robotic agent may significantly change if the partner is a child or an adult. Additionally, we showed that the analysis of children’s justifications for their behaviour may be enlightening with respect to children’s actual psychological understanding of constructs, such as that of fairness.

It is important to stress that the present results need to be confirmed with an ampler sample allowing to increase the power of the effects observed. Also, we cannot be sure as a matter of fact if using different living or non-living entities (e.g., an animal, a chair with drawn eyes, etc.) would produce alternative results as those presented here for the robot. Future studies could then use additional agents and eventually address online, in vivo, interactions with the robot in order to support the present findings. On the whole, and as also remarked in Kompatsiari et al. [53], findings of this kind may be relevant to the human–robot interaction research with the aim to develop social behavior of artificial agents, as well as for research studying mechanisms of social cognition.