1 Introduction

The RoboCup@Home competition [1] aims at bringing robotic platforms to use in realistic domestic environments. In contrast to other leagues like soccer, which predefine and standardize the field, robots in the @Home league need to deal with different apartment layouts, changing decorations, unknown sites, unstructured public spaces, as well as cooperating or interfering humans. Human operators are not at all – or only very briefly – instructed how to interact with the robot. Thus, the design and robustness of human-robot interaction is one of the key challenges for the RoboCup@Home competition – and especially for the Social Standard Platform League. In this paper, we treat this issue on different levels. On the level of capabilities, we extend the Pepper platform by using alternative speech recognition, person recognition, and person tracking components to improve robustness. On the level of system integration, we use a flexible framework that allows to integrate software components developed in different ecosystems as well as to easily configure and use on-board and off-board computation. On the level of the interaction interface (Open Challenge and Final), we offer an intuitive Augmented Reality device that allows a more transparent inspection of the state of the robot for the user, its integration as an extended sensor device for the robot as well as a human teach-in for the configuration of new scenarios.

The RoboCup@Home competition consists of a set of benchmarking tasks that are adapted or newly defined each year. These typically require multiple capabilities, like navigation and mapping, person recognition and tracking, speech understanding and simple dialogues, object recognition and manipulation. The competition is organized into different stages. Within the first stage, tests focus on a small set of capabilities (e.g. person following and guiding or object recognition and manipulation) scoring the best two tries out of three. The stage is finalized by an integration challenge (GPSR – General Purpose Service Robot) where robots have no predefined task, but need to autonomously sequence a task given by speech. The best 50% of the teams proceed to the second stage. Here, robots are tested in an enhanced and longer version of GPSR (EE-GPSR), in a real restaurant as a waiter, and in an individual open performance (Open Challenge). The final is an extended open challenge that is judged by an internal and external jury. The Team of Bielefeld (ToBI) was founded in 2009 and successfully participated in the RoboCup German Open as well as the RoboCup World Cup from 2009 to 2018 with different robotic platforms. In 2016, the team ended first in the Open Platform League (OPL) [2]. At RoboCup 2017, the team achieved the third place in the OPL competition and the seventh place in the Social Standard Platform League (SSPL). Finally, the team achieved the SSPL world champion award at RoboCup 2018. Thus, our overall approach as been successfully ported to the Pepper platform which has to deal with (i) limited processing capacities on the platform and the low bandwith of the wireless connection to external computing resources, (ii) limited sensor capabilities, e.g., low range and low resolution in space and time of the ultrasonic and laser sensors, (iii) its own ecosystem (NaoQi) which needs to be integrated with other ROS-based components. In the following sections, we will describe our approach to establish an improved development environment for the Pepper robot that allows to support the RoboCup activities as well as the more general research agenda on human-robot interaction.

Bielefeld University is involved in research on human-robot interaction for more than 20 years especially gaining experience in experimental studies with integrated robotic systems [3]. Within this research, strategies are utilized for guiding the focus of attention of human visitors in a museum’s context [4]. Further strategies are explored in a project that combines service robots with smart environments [5], e.g. the management of the robot’s attention in a multi-user dialogue [6]. A critical property for any human-robot interaction experiment is the reproducibility of the robotic system and its performance evaluation during its incremental development progress. However, this is rarely achieved [7]. This applies to experimentation in robotics as well as to RoboCup. A Technical Description Paper (e.g. [8]) – as typically submitted to RoboCup competitions – is by far not sufficient to describe or even reproduce a robotic system with all its artifacts. The introduction of a systematic approach towards reproducible robotic experiments [9] has been turned out as a key factor to maximally stabilize basic capabilities like, e.g., navigation or person following. Together with appropriate simulation engines [10] it paves the way to an automated testing of complete RoboCup@Home tasks.

Fig. 1.
figure 1

Robotic platforms of ToBI. Pepper is 120 cm tall, the overall height of TIAGo is adjustable \(\approx \)110 cm–145 cm as well as the Floka platform \(\approx \)160 cm–200 cm. (\(^*\) http://innoventionsblog.blogspot.de/2014/06/meet-pepper-first-personal-robot-who.html)

2 Robot Platforms and System Description

During the last years, the RoboCup@Home and related research activities at Bielefeld University utilized different robotic platforms. In 2016, ToBI participated with the two service robots Biron and Floka [2], in 2017 with Biron and Pepper, in 2018 with Pepper. Current research also aims at the TIAGo platform. Figure 1 gives an overview of the three platforms (Pepper, TIAGo, Floka) which are still in the focus of current research activities. Although focusing on the Pepper in this paper, we still aim at the development of platform independent as well as multi-platform robot capabilities. The Social Standard Platform Pepper  (cf. Fig. 1(a)) has been newly introduced to the RoboCup@Home competition in 2017. It features an omni-directional base, two ultrasonic and six laser sensors. Together with three obstacle detectors in his legs, these provide him with navigation and obstacle avoidance capabilities. Two RGB cameras, one 3D camera, and four directional microphones are placed in his head. It further possesses tactile sensors in his hands for social interaction. A tablet is mounted at the frontal body and allows the user to make choices or to visualize the internal state of the robot. In our setup we use an additional laptop as an external computing resource which is connected to the on-board computer of the Pepper via Wi-Fi. Because the on-board laser is quite short range, we developed a hardware mounting for an external laser sensor (Fig. 1(b)) that can be easily attached or removed. Thereby, the Pepper is enabled to build precise maps of the environment that can be used during competition for navigating with the limited on-board laser sensors. In our research, all three robot platforms are run with the same framework but slightly different robot skill implementations. This allows a transfer of robot behaviors between platforms on an abstract level.

Fig. 2.
figure 2

System architecture for the Pepper platform. The software components are partially deployed on an external computing resource. The architecture abstracts from communication protocols and computing ecosystems. Thus, ROS as well as NAOqi processing components can be used on the external computer as well as onboard the robot. Images are streamed in a compressed format in order to meet online processing requirements.

System Architecture: Our service robots employ distributed systems with multiple clients sharing information over network. On these clients there are numerous software components written in different programming languages. Such heterogeneous systems require abstraction on several levels. Figure 2 depicts a simplified overview of the system architecture used for the Pepper robot including an external processing resource—a single high performance laptop. In our architecture, the NAOqi framework still encapsulates hardware access to the robot, but we additionally managed to run ROS on the head PCFootnote 1 of the Pepper. Our installation includes the entire ROS navigation stack and the depth processing pipelineFootnote 2 for instance. This allows a further abstraction across different ecosystems and seamless integration. Software components from both worlds, NAOqi and ROS, can be flexibly deployed onboard or offboard the robot. Skills in the same ecosystem can communicate using ROS or native Qi messages, those in different ecosystems communicate through a ROS wrapper.

The computational resources on the robot’s head PC are limited. Thus, only components that are time-critical, e.g. for safe and robust autonomous navigation, are deployed on the head PC, while other skills, like people perception, speech recognition, semantic scene analysis and behavior coordination, are running on the external laptop. In order to meet online processing requirements in certain robot behaviors, e.g. person following, depth and color images are streamed in a compressed format achieving frame rates of approximately 10 Hz.

The robot behavior is coordinated using hierarchical state machines. The hierarchical structure consists of re-usable building blocks that refer to abstract sensors and actors, skills, and complete task behaviors. A typical abstract sensor would be a people detector, while a typical skill would be person following that already deals with certain interferences or robot failures like briefly loosing and, then, re-establishing a human operator. As far as possible, we re-use robot skills that already have been used on previous RoboCup@Home or related research systems [2], like Floka or TIAGo. However, this has certain limits if, e.g., a skill person following is based on dense, longer-range, high-frequency laser scans. The laser scans of the Pepper platform only achieve a frame rate of 6.66 Hz with a very low resolution and reliable range. Therefore, we already merged the LIDAR with depth information from the camera located in the head of the robot. However, this requires that the robot looks down rather than looking up watching for people. Thus, this conflicts with other robot behaviors introducing new dependencies in the skill and behavior design of the robot. Abstracting skills from task behaviors still leads to a description of task-level state machines that are agnostic with regard to such considerations. The explicit definition of skills further allows to reason about them and track their success during the performance of the robot. Based on this, new elements had been introduced during the last years, like reporting on success and failure of tasks assigned to the robot in GSPR [2].

Fig. 3.
figure 3

Simulation of RoboCup@Home tasks for Pepper in MORSE.

Development, Testing, and HRI Simulation: The continuous as well as incremental software development process is based on the Cognitive Interaction Toolkit (CITK) [9]. It provides a framework that allows to describe, deploy, and test systems independent of the underlying ecosystem. Thus, the concepts apply for ROS-based components and systems as well as for those defined with, e.g., NAOqi. Combined with an appropriate abstraction architecture, a re-usability of components and behaviors can be achieved across platforms. The CITK framework has already been applied to the Nao platformFootnote 3 as well as previous RoboCup systems including the Pepper platform in 2017 and 2018. For the RoboCup@Home SSPL competition we further work on enhancing our simulation approach that allows to easily switch between the real hardware and a simulated environment including virtual sensors and actors. In order to keep our cross-platform approach, we utilized the MORSE Simulation framework [11] that additionally offers extended possibilities for modeling virtual human agents for testing human-robot interaction scenarios [10].

The software dependencies—from operating system dependencies to inter-component relations—are completely modeled in the description of a system distribution which consists of a collection of so called recipes [9]. In order to foster reproducibility/traceability and potential software (component) re-use of the ToBI system, we provide a full specification of the 2016 system in our online catalog platformFootnote 4. The catalog provides detailed information about the soft- and hardware system including all utilized software components, as well as the facility to execute live system tests and experiments remotelyFootnote 5. The MORSE simulation environment [11] allows to conduct human-robot interaction experiments and provides virtual sensors for the cameras and laser-range sensors (see Fig. 3(a)). The virtual image streams and laser scans are published on the equivalent ROS topics which are used by the real sensors. In Lier et al. [10], we show how to utilize this framework for an automated testing of a virtual human agent interfering with the navigation path of a robot (see Fig. 3(b)).

Fig. 4.
figure 4

Enhanced capabilities of the Pepper system: mixed-reality HRI

3 Research on MR-HRI and Bimanual Handovers

Facilitating HRI by Mixed-Reality Techniques: Further research is conducted with the Pepper platform in order to explore how human-robot interaction can be facilitated by mixed-reality techniques (Fig. 4) [12, 13]. Augmented and Mixed Reality techniques are already applied in various areas of robotics development and debugging [14,15,16,17]. Apart from tele-operators and developers, everyday HRI can also be enhanced using AR/MR techniques by displaying virtual avatars on physical robots [18], creating spatial dialogues [19], augmenting a work cell of a industrial robot arm [20] or by communicating intended movements [21]. In our ongoing work, we pick up these ideas but add a novel approach which we suppose to be even more helpful for a human user. Our aim is to not only use a MR headset for visualizing data, but to also integrate its sensor data, giving the user a more direct interface to the robot. This way, on the one hand the user can always be aware of the current robot status and intent. On the other hand, the robot can integrate the human’s location in the environment as well as data of the MR headset from, e.g., RGB-D sensors. Moreover, voice instructions can be given remotely, even when the robot is located in another room. For implementing such a scenario, we integrated the Microsoft HoloLens into our robotic framework based on ROS [22]. The Unity3D game engine was used for implementation on the HoloLens. Communication between the MR device and ROS was realized using MQTT. Making use of the room-scale tracking capabilities of the HoloLens, we only initially had to calibrate the coordinate system of the robot and the MR device. This was done by displaying an AR marker on the tablet attached to Pepper. After this marker is detected once, the robot is correctly represented in the coordinate system of the HoloLens and vice versa. Pose updates of the robot are used to also update the representation in the HoloLens. To facilitate interactions with the robot, we use AR in two different ways: like done in previous work, sensor data are visualized to a get a better grasp of the robot’s capabilities. Here, we show the map and the robot’s localization on it, the costmap and laser scans for giving sensory information. The planned path is shown for making the user aware of the next movements (Fig. 4(b)). Thus, the user is able to understand the reason for e.g. the robot not being able to reach its current navigation goal. This will also help the human to take the correct path, not interfering with the path of the robot. For grasping, the robot can visualize its grasp space when it is not able to reach to an object. This way, the robot can actively ask the user for help, committing information which otherwise would not be obvious. Secondly, since the HoloLens is integrated with the robot’s coordinate system, it can be used as an additional sensory and input device. In our example case, by this means the robot gains knowledge about the user’s position and orientation in the environment. Wherever the user goes, she can instruct the robot to come and fulfill a task by a voice command interpreted by the AR device. The user’s view using the Microsoft HoloLens can be seen in Fig. 4(a). In the RoboCup Open Challenge and Final, we successfully realized a mixed-reality HRI application scenario simulating the business case of a robotic version of Airbnb. In a first run, the owner of an apartment teaches the robot a sequence of behaviors for a procedure for welcoming and introducing a guest. Therefore, she or he uses the HoloLens to teach in places: using the localization of the HoloLens, the owner just walks through the apartment wearing the AR device and remotely instructs the robot what to say at which place in the apartment. After that, when the guest arrives the robot takes the initiative, identifies the guest based on face identification, and proceeds the tour though the apartment in a guiding mode telling the guest the taught-in information about each room. At the kitchen counter the robot is further instructed to check if any drinks are missing. Any feedback information is communicated to the owner’s mobile device without the owner being required to be present at the site.

Fig. 5.
figure 5

Enhanced capabilties of the Pepper system: bimanual handover

Bimanual Object Handovers: Although the Pepper robot is not made for grasping, delivery tasks are a typical use case that is expected from a service robot at home. Regular strategies for grasping do not apply because the Pepper hand is under-actuated offering only a closing or opening, the hand is too small to grasp typical household objects, the large backlash of the arms’ gears results in an imprecise control, and finally the depth camera in the head provides no valid 3D data at grasping distance. For the RoboCup@Home competition, we explored two different strategies for compensating the deficiencies of the platform. First, we implemented an interactive strategy for an open loop handover of a customized tray. Here the operator triggers the robot by speech to get into a predefined handover pose. Then the robot takes over and instructs the operator to hang one side of the tray into the right hand and touch the back of the hand. Then, the robot moves the left hand to its final position while the operator hands in the other side of the tray. Finally, the robot closes both hands to hold the tray in front of the robot. This strategy has been successfully used in the restaurant task to carry the drinks and combos from the bar to the guests. In the Final, we further showed an enhanced strategy for an autonomous bimanual handover strategy. Here, the robot perceives a 3D box presented to the robot, computes appropriate grasping points for both arms, and, plans a synchronized bimanual movement to grasp it. First, the box is segmented using a model-free segmentation algorithm on the depth images [23] combined with a fitting of box- or cylinder-shaped primitives. This step is computed on a larger distance than grasp distance because of the limitations of the depth camera. Figure 5 illustrates the grasping process. As shown in the upper part the manipulation strategy is robot agnostic. It applies the Task Constructor framework of MoveIt! [24]. Here, sub-tasks are defined in stages which model atomic movement planning problems. A generator stage was used to build a bi-manual grasp generator. The initially generated grasp consists of two poses, one for each end-effector. These are sampled on the object’s surface considering that both end-effectors apply enough force on the object between them to hold it. As the Pepper arms only have 5-DoF for computing a 6D-pose, the KDL inverse kinematics solver was patched to deal with low precision requirements in one rotational DoF. The trajectories of each arm where independently generated and then synchronized in a merging container combining both into a single trajectory. First successful handovers where shown at the RoboCup 2018 finals.

Fig. 6.
figure 6

Accumulated scores from the 1st and 2nd stage tests in RoboCup@Home. The numbers on the x-axis refer to tests, the \(X^*\)-tests have been won by the ToBI-Team.

4 Analysis, Lessons Learned and Conclusion

During the RoboCup@Home competition the Pepper robot of our team achieved significant scores in all tests of the competition. In Fig. 6, the results of the 5 best teams of RoboCup@Home SSPL 2018 are shown for the stage-1 and -2 tests. ToBI achieved best performances in the Speech & Person Recognition, Cocktail Party, Help Me Carry, GPSR, Open Challenge, Restaurant, and EE-GPSR tests. For most of the required capabilities, the onboard Pepper components have been exchanged by other available standard libraries that were integrated using the hybrid architecture presented before. The key components (navigation, speech processing, basic person detection) were deployed on the on-board computer so that the team was not severely affected by WLAN dropouts.

The navigation component – we used ROS Gmapping and the standard ROS planning pipeline – robustly worked with the laser and camera depth data which were fused in a preprocessing step. There was only a single dropout in the Tour Guide task, where the robot blocked itself because of a failure in the setup procedure. For speech processing, we used PocketSphinx with context specific grammars that were adapted to each dialogue step. In most conditions this worked quite well. However, in cases where there was too much noise we offered an additional user interface on the Pepper’s touchpad where the robot’s question and the options for the user’s answer were displayed by buttons or menus.

For development and testing, we exploited the dedicated toolchain for reproducible experimentation in robotics described in Sect. 2 [9, 25]. In the Cognitive Interaction Toolkit (CITK), there is a versioned description of any – incrementally developed – system distribution including all software and data dependencies which is automatically built on a continuous integration (CI) server. This allowed to track down any system change that might have caused an error or repaired it. Another important aspect is to design robot behaviors for the complete competition and not only for a single test. For example, furniture or persons sometimes blocked navigation goals in the apartment; thus, the robot needed to deal with these situations all the time – not only when mentioned in the rulebook. A general behavior for memorizing and reporting about these events even achieved additional points for the team in the EE-GPSR test. The reasoning about task performances will be a critical capability for future RoboCup@Home developments. The mixed-reality scenario implemented for the Open Challenge and Final has been ranked first by the league internal and external juries. This shows the general interest in possible application scenarios for a standard platform like Pepper.

5 Conclusion

We have described the main features of the architecture and technical solution of the ToBI system for the RoboCup@Home Social Platform League (SSPL). There have been several key points that were essential for winning the SSPL competition in Montreal 2018: (i) heterogeneous computing environments: We proposed a general approach – the Cognitive Interaction Toolkit – that allows us to deal with distributed computing and different ecosystems in a unified manner. This allowed us to integrate ROS-based components, on-board NaoQi-skills, and even external sensor devices like HoloLens (with communication based on MQTT) in a systematic, easy to use building process. This is also essential for keeping a vivid exchange to the other sub-leagues, which systems are mainly based on ROS. (ii) Graceful degradation: Even if Wi-Fi breaks down or it is too noisy to recognize speech, the platform must continue to work. This has been realized, e.g., by installing essential components directly on the robot head, or by offering a continue rule interface on the robot’s touchpad. We further introduced several ways of augmentation of the platform by adding an external laser for mapping, by using a tray for transportation, or by connecting a HoloLens for an extended human-robot interaction. This opens new ways to keep the Pepper platform competitive also to open robot platforms, which is important to develop all sub-leagues towards common goals. (iii) Modular behaviors that are applicable when ever needed, in contrast to programming fixed behavior sequences for pre-defined tasks: An example is the reporting capability of the robot about what went wrong. Such an approach opens up further possibilities to interactively teach a robot appropriate behaviors, which was shown in the Airbnb-scenario in the RoboCup@Home SSPL Final. Overall, the SSPL league has made a larger performance step forward compared to last year, where significantly less scores were achieved. This makes us confident that there is still much potential for further developments, which will significantly profit from an intensified exchange between the RoboCup@Home leagues. We presented several avenues how to technically support this.