Mixed reality representation of hazard zones while collaborating with a robot: sense of control over own safety

San Martin, Ane; Kildal, Johan; Lazkano, Elena

doi:10.1007/s10055-025-01107-2

Mixed reality representation of hazard zones while collaborating with a robot: sense of control over own safety

Original Article
Open access
Published: 19 February 2025

Volume 29, article number 43, (2025)
Cite this article

Download PDF

You have full access to this open access article

Virtual Reality Aims and scope Submit manuscript

Mixed reality representation of hazard zones while collaborating with a robot: sense of control over own safety

Download PDF

645 Accesses
Explore all metrics

Abstract

Safety is the main concern in human-robot collaboration (HRC) in work environments. Standard safety measures based on reducing robot speed affect productivity of collaboration, and do not inform workers adequately about the state of the robot, leading to stressful situations due to uncertainty. To grant the user control over safety, we investigate using audio, visual and audio-visual mixed reality displays that inform about the boundaries of zones with different levels of hazard. We describe the design of the hazard displays for scenario of collaboration with a real robot. We then report an experimental user study with 24 users, comparing performance and user experience (UX) obtained with the auditory display, the visual display, and the audio-visual display resulting from combining both. Findings suggest that all modalities are suitable for HRC scenarios, warranting similar performance during collaboration. However, distinct qualitative results were observed between displays, indicating differences in the UX obtained.

Level of Robot Autonomy and Information Aids in Human-Robot Interaction Affect Human Mental Workload – An Investigation in Virtual Reality

Augmenting the Human-Robot Communication Channel in Shared Task Environments

Human–Robot Collaboration Using Visual Cues for Communication

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Industrial production is moving from mass production to what is called “mass customization”, where batches can be as small as a single unit Bruner and Kisgergely (2019); Matthias (2015). In this new scenario, Human Robot Collaboration (HRC) using collaborative robots (a.k.a. cobots) provides the flexibility that is required, and consequently the presence of collaborative robots in industrial facilities is increasing significantly Cobots for Flexible Automation (2021); Williams (2018).

In this new scenario the most important requirement to meet is safety Lohse (2009); Bolstad et al. (2006); Onal et al. (2013), but the use of standard safety methods in industry such as curtains or physical barriers makes it impossible for human and robots to collaborate by sharing the same workspace.

Implementing current regulations Robots and robotic devices (2016, 2011a, 2011b), cobots are designed and built to be intrinsically safe, with safety-ensuring capabilities such as anticipating collisions and reacting to them Matsas et al. (2018). However, even though intrinsic techniques have been demonstrated to be sufficiently safe Malm et al. (2019), users might still experience a sense of risk when entering the area of influence of a robot. This concern arises from the possibility of potential hazards, such as malfunctions in safety mechanisms or risks associated with the robot holding objects. Collision detection techniques are thus insufficient on their own to provide a situation awareness that promotes a sense of safety and preserves trust in robots Lasota and Shah (2015). In the long term, the stress caused by working in a state of permanent uncertainty due to poor situation awareness can damage health Oken et al. (2015). Improving situation awareness is thus necessary for workers’ well-being and safety, as well as to obtain a good user experience (UX) from collaborating with robots Lohse (2009); Bolstad et al. (2006); Onal et al. (2013). In recent years, and partly motivated by the need to improve situation awareness, an increase in the research of mixed reality (MR) techniques applied to HRI has been observed Bagchi et al. (2018); Green et al. (2007); Pan et al. (2012); Rowen et al. (2019).

Recent literature on situation awareness shows that previsualizing robot movement trajectories is a commonly used technique that often leads to significant improvements in safety and performance during human-robot collaboration Macciò et al. (2022); Cleaver et al. (2021); Tsamis et al. (2021); Rosen et al. (2019). However, even with previsualized trajectories, users are still not fully aware of possible danger related to sharing a space with the robot, and such state of misinformation could affect negatively their perception of safety Cong Hoang et al. (2021). In this research, we also use previsualization of robot trajectories, to match common practice. Taking such baseline design as starting point, we propose adding multimodal (auditory and/or visual) MR display presentations of hazard-related situation awareness information, which might convey improved perception of safety. According to the classification by Suzuki et al. (2022), our research aims to develop an on-body approach that augments the surrounding to improve the awareness that the user has about the robot’s state. With such hazard-related information, the goal is that users are in control of their own safety, reassured that they are aware at all times of the varying levels of potential hazard that results from the proximity between robot and worker, while both move across the shared workspace. The idea of granting control to the human provides the opportunity to evaluate whether the designs developed are sufficient to maintain user safety and to observe differences between modalities. By allowing the human worker to decide how to position themselves in relation to the moving robot, we can better understand the effectiveness of our safety designs and their impact on perceived safety and task efficiency.

Although this might contravene current safety regulations, one of the aims of this work is to assess the perceived sense of control over own safety. In real work scenarios, deployment of hazard awareness displays would be done in conjunction with any other necessary safety mechanisms, as required by risk analysis exercises and mandatory regulations.

This paper makes several contributions. We first identify and discuss the limitations that current HRC safety strategies present. In response to this, we propose an MR application that extends beyond simply informing users of the robot’s upcoming trajectory. Our application provides real-time updates on multiple danger levels through both audio and visual cues, ensuring users remain aware and safe while working alongside robots in a shared workspace. In this way, users do not only know whether they are in danger or not, but also the level of danger they are exposed to, even in instances when they are not looking directly at the robot. The designs of an auditory and a visual hazard awareness display are reported, and their implementations described. We then report an experimental user study (n=24) to compare the performance and subjective experience obtained from using the auditory display, the visual display, and the audio-visual display that results from combining both, while completing a task in a space that is shared with a moving physical robot.

2 Related work

Safety in robotics often refers to industrial applications, as development in robotics and their adoption has been the fastest in that area. Manufacturing industries commonly use Computer Aided Design (CAD) and Manufacturing (CAM) systems in order to deal with rapid technological progress and aggressive economic competition on a global scale Makris et al. (2014).

In the last ten years, AR has emerged as a key technology in manufacturing Montuwy et al. (2017); De Pace et al. (2018). This technology assists in the workspace by presenting in the worker’s field of view the information that needs at each moment. AR devices offer the possibility to display dynamic information and enhance the UX in modern industrial workplaces, including those using cobots. AR devices have been addressed as aids for workers in a variety of sectors such automotive industry Doshi et al. (2017), assembly industries Evans et al. (2017), logistic industries Reif and Günthner (2009), shipyards Blanco-Novoa et al. (2018) and construction Li et al. (2018). AR has also been used for assisting humans working alongside fenceless collaborative robots in a variety of contexts Makris et al. (2016); Matsas and Vosniakos (2017); Wen et al. (2014).

At the same time, the increase in the use of cobots creates a need to improve safety and situation awareness with the aim of reducing anxiety created when working with robots. AR has proved its potential in addressing this issue. Brending et al. (2016) suggested that the use of AR in HRC could reduce anxiety by showing contextual information to a human operator working in close proximity with a robot. Authors in Vogel et al. (2016) suggested that a projection-based AR system could improve HRC.

With this motivation, Vogel and others Vogel et al. (2011, 2013, 2015, 2021) developed a projection-based sensor system, using cameras and data from robot controllers to monitor the target object’s position and the configuration of the robot. Based on this information, they projected on the work surface safety areas around the robot and the object to be grasped. If any object crossed the boundary of the safety areas, the robot stopped. While safe, this behavior is not compatible in practice with collaborative work requiring the worker to continuously cross that boundary. The worker may need to be made aware of the situation, without the robot slowing down or stopping altogether. At the same time, utilizing 2D AR interfaces requires the user to shift their focus between the display and physical workplace. Improved situation awareness is also needed when the worker has to look away from the robot during collaboration. In such contexts, extended reality (XR) technologies offer the opportunity to present information about the relative position of the robot, thus warning the worker about the potential hazard.

Prior attempts have been made to improve such awareness using XR technologies. In Hietanen et al. (2020), authors proposed a workspace model for HRC, based on three different zones, robot, human and danger zonesBdiwi et al. (2017), which were continuously monitored and updated. This model was presented to the user by two different AR-based interactive user interfaces: projector-mirror and HoloLens. However, in this study, the user was only allowed to enter the workspace when the robot was stopped, and the boundaries, which conveyed the actual configuration of the robot, were displayed even when the user was in a safe space. In Aivaliotis et al. (2023), the same situation is observed, where only boundaries related to the actual configuration are displayed, and no data concerning the user’s state is shown. In Bolano et al. (2018), authors made use of visual and acoustic information to warn users about the robot’s intention, making the planned trajectories easier to understand. Nevertheless, the acoustic information only provided notifications about new trajectory re-planning and execution, while the visual data was displayed on a screen, causing users to shift their attention while working. Using Unity and Vuforia, Palmarini et al. (2018) developed an AR interface which virtually displayed the planned trajectory of the robot before it was executed. They concluded that AR systems improve the trust of the operator in collaborative tasks. In Gruenefeld et al. Gruenefeld et al. (2020), the authors presented three methods for expressing robot intention using gradient maps based on warm-cold color mapping. However, in this approach, users must focus on the robot or its immediate surroundings to understand the robot’s intentions and their current state. This requirement can cause users to alternate their attention between their task and the robot, potentially affecting task efficiency. In addition, gradient mapping has been shown to require more cognitive resources to interpret user states, which can increase mental workload and might not be adequate for safety-critical situations where quick interpretation is crucial Ware (2019). Matsas et al. (2018) prototyped proactive and adaptive techniques for HRC in manufacturing using virtual reality (VR). They compared the effectiveness of each technique but did not analyze the effectiveness of the information displays used. In San Martín and Kildal (2019), we analyzed different audio and/or visual hazard information display designs, aimed for a generic static source of hazard, emerging from a static vertical axis in space. A user study showed that the auditory and a visual display obtained through iterative design resulted in behaviors that were reasonably equivalent as hazard displays and in the hazard related information that they conveyed. In a second user study San Martin and Kildal (2021) we compared the single modality displays and the audio-visual display resulting from the combination of both. However, the scenario in San Martin and Kildal (2021) involved hazards of an undetermined nature, emerging from vertical holographic axes fixed in space, and with no resemblance of robots either in shape or in behavior.

While advancements in MR technologies have increased their usage in HRC applications, their potential to improve situation awareness remains insufficiently researched. Proposed MR solutions for HRC often fall short of providing comprehensive insights into the user’s safety status during collaborative work with robots. To address this, we propose an MR application that goes beyond merely offering information about the robot’s next trajectory. Our application provides real-time updates on various levels of danger through both audio and visual cues while users are actively working alongside robots in the shared workspace. This real-time feedback informs users about the degree of risk they are facing and the specific factors contributing to each level of danger, empowering them to navigate potential hazards more effectively. Concurrently, our work aims to evaluate and assess the effectiveness of audio, visual, and audio-visual displays and analyze the strengths of each display in an HRC scenario.

3 Developing an awareness display for HRC

The information displayed to raise awareness of hazards in HRC scenarios is dependent on the characteristics of such scenarios. In the context of this paper, HRC scenarios involve a worker that shares space with a real robot arm, where both robot and human cooperate to complete a task. Robot and human move in relation to each other and, based on real time information from the awareness display, it is the human who modifies their behavior to modulate exposure to the hazard resulting from proximity with the robot.

For the HRC scenario in this paper, the physical cobot arm selected was an LBR Iiwa 14 robot. To preserve user safety during the display design process, a virtual version of the scenario was used, consisting of the virtual expression of the digital twin of the selected LBR Iiwa 14 robot, which preserved the dimensions and kinematics of the real robot. A virtual single-legged white round table was added to the otherwise empty collaboration space, with an also-virtual object on the table. Target positions for the object on the table were represented as small transparent boxes. In this scenario, a simple HRC task involving the manipulation of the boxes amid the interfering movement of the virtual robot could be carried out, to test display designs developed in successive iterations.

The virtual scenario was displayed on a Microsoft HoloLens 2 head mounted display (HMD). We selected HoloLens 2 due to its capability to deliver visual information even when users have their back to the robot, while user can see through the real world and provide directional information through 3D spatial sound. Additionally, HoloLens is more feasible to generalize to other scenario Park et al. (2021) and potentially reduces setup costs associated with cameras and projectors. For the implementation, we used Unity (v.2020.3.11f1) and the Unity Robotics Hub (V.0.7.0) with ROS-TCP Connector(v.0.7.0) in order to connect with the Robot Operating System (ROS). A Dell computer with an Intel CPU i7 and 32 GB RAM was used to run ROS, MoveIt and Gazebo, which provided the capability to simulate the robot. The computer containing ROS was directly built in the robot and it moved the robot to the position set by HoloLens and then, the visualization of the HMD was updated in real-time with the simulated robot. The Hololens made use of ROS-TCP via WiFi in order to exchange information with the robot’s computer. HoloLens and the Figs. 1 and 2 combine representations of the visual and auditory hazard awareness displays that we designed and tested with the interactive setup just described. Details of each display design are described in the following subsections.

3.1 Design of the visual display

In order to design the visual displays, we first analyzed what color codes are standard for hazard warning. We based our choice of colors on this analysis, so that users could recognize the level of hazard they were exposed to while collaborating with the robot. Zielinska et al. (2014, 2017) reported that red color is perceived as the most hazardous of all colors, followed by orange and yellow in second and third place. Although there was no significant difference in the hazard perceived between these two colors, the orange color was perceived in both studies as more dangerous than yellow. Hence, the red color was used for the highest level of hazard, orange for a middle hazard level and yellow for the lowest level of hazard. From the studies it can also be inferred that green color is perceived as signaling a space that is safe or with a level of danger that is very low. Hence, we decided to display green color for one second, once the participant left the dangerous space and as confirmation of entering a safe space.

However, we did not only want to make users aware of the level of hazard they were exposed to, but also aware of the distance they were from the focus of danger. To achieve that, the dynamics and movements of the robot were considered: for each level of hazard, the shape of the hazard volume was displayed warping around the robot. The resulting shape of the colored hazard zone was informative about the distance to the robot, as it varied in shape and color with the movement of the robot and with the level of hazard of the space occupied.

Following these ideas, the visual display was designed as follows:

Low level of danger, yellow: This level of danger warned users when they entered a space in which they could be reached by the robot (the region of influence of the robot). This space was represented by the whole reachable volume of the robot, which is static (similar to Fig. 1-a1). We decided to represent it with a simple geometric shape, a sphere (Fig. 1-a2) to make it easier to understand and estimate the extent of that region.
Middle level of danger, orange: The orange level (a sub-region of the yellow level) corresponded to a region closer to the source of danger, and linked to a specific trajectory that the robot was executing at the time. In each trajectory, the robot moved from a current position to a target position, and the volume created by all the poses along that trajectory was the volume used to represent the orange hazardous space. The exact shape created by all the poses of the target trajectory could be quite complex (similar to what is represented in Fig. 1-b1). To simplify the shape and make it easier to understand at a glance, we assimilated it to a cylinder (as shown in Fig. 1-b2), which contained (by excess) the complete exact volume. The cylinder was constructed with a radius and a height obtained by computing maximum distances in x, y and z, considering the robot’s configuration and target point, from the center of danger. The highest and lowest values in x, y and z were transformed into a cylinder, which is a simple geometric shape that contains completely (and by excess) the tighter exact representation of the hazard volume. This volumetric figure was in constant change with the movement of the robot.
High level of danger, red: The volume with the highest level of danger (a sub-region of the orange volume) represented imminent danger. For a robot, this corresponds to the immediate proximity to the position currently occupied by the robot. This hazard space was a skin-like layer around the robot, as shown in Fig. 1-c2. The thickness of the skin above the robot was 0 mm when the robot was static, and it grew linearly with the speed of displacement of the robot. The maximum speed allowed for the displacement of the robot was 1 m/s, which corresponded to a thickness of the red region of 125 mm above the surface of the robot.

As shown in Fig. 1, in the visual display design, users could see at all times the boundaries of each danger volume activated by their presence, to help them identify the way out quickly and easily.

3.2 Design of the auditory display

For the auditory display, we used a short sound pulse (440Hz 50ms) that was heard pulsating at varying frequencies when the user was inside the region of danger and depending on the distance to the robot. The sound was spatial, to help the user perceive in which relative location in the surrounding space the robot was. The sound was single pitch, instead of a range of frequencies, in order to convey a more moderate sense of urgency Edworthy et al. (1991). Chords or harmonically-complex sounds were also avoided, so that the sound did not interfere with other audition-based communication channels during interaction, such as with natural speech Edworthy and Hellier (2006). We used tempo as the parameter to map the distance to the origin of the danger Giang and Burns (2012). In Zobel (1998), Zobel concluded that users could perceive a strong differentiation in urgency between 1 Hz, 2 Hz, 4 Hz, and 6 Hz pulse rates, and users could also detect higher urgency with higher frequencies such as 8 Hz. Following this idea, we used a pulse rate of between 2 and 10Hz, depending on the distance of the user to the center of danger, located in the centroid of the geometric configuration of the robot (the “c” point in Fig. 2I). The frequency of the pulse increased linearly as the user got closer to the c point.

The auditory signal activated in the same position as the visual display’s yellow zone did, i.e., as soon the operator entered the region of influence of the robot (Fig. 2I-a). Pulsation frequency was lowest when the robot reached its furthest position in the opposite direction to the user. In this way, the equation that related pulsation frequency with user proximity was obtained (see equation in Fig. 2I-c)). Using the head related transfer functions (HRTFs) from MRTK, the sound was spatialized, so that the user could hear the hazard warning emerging from a specific position in the space around them, and thus could be located easily.

3.3 Collision detection of user with hazard spaces

Unity’s support for collision detection was used to detect collisions between user and each hazard zone to determine when the user entered a hazard zone. This is a functionality of Unity that informs about the collision of two objects.

The visible shapes described in the visual design and the invisible reachable space described in the auditory design, were all objects created in the Unity scenario. Some of these, such as the orange and red shapes, were updated during execution to exhibit the behavior just described. Interacting with them required identifying potential collisions of the user with the objects of the different hazard levels. For this, another three objects were created to represent the user’s body.

The first object was a 3D rectangle that represented the user’s body. This object was placed in the position of the HMD device, and so it moved with the user. This object could be edited when the application was launched, inserting the user dimension in x, y and z (Fig. 2II-a)). It was assumed that the user would work sitting down or standing up, so the object would only rotate with the z axis of the glasses Fig. 2II-b). The problem came when the user reached out with a hand, because the hand then came out of the box object containing the body. To detect this, HoloLens supports articulated hand-tracking of the user when wearing the glasses, as long as they are inside the HMD field of view. Whenever the Hololens device detected a hand, an object with its corresponding label (‘Hand_left’, ‘Hand_right’) was added to the scene. Making use of these objects, a box was attached to each arm as shown in Fig. 2II-b), so as to represent the worker’s hands when either of them came out of the body’s box.

Having created the objects representing the user, a script detected when the operator collided with any hazard zone (OnCollisionEnter) and identified which level of hazard it was. With this information, if the visual modality was used, the corresponding hazard zone was turned on. Instead, if the modality was auditory, the audio with its corresponding pulse frequency was played. In the same way that collisions were detected, events of the user leaving these objects (OnCollisionExit) were also detected. In the visual display, when the user left the lowest danger zone, a green light was displayed for one second to inform the user that she was safe.

4 Modality comparison user study

Once completed the design and implementation process of two modality-specific hazard awareness displays, we conducted an experimental user study (n = 24) with the physical robot, to assess each display on its own, as well as the audio-visual display that resulted from the combination of both. To ensure participants’ safety, any sharp or pointy object that could harm participants was removed, and the robot was set to move very slowly during the study, with a mean speed of 0.25 m/s, so that users could react easily to the robot’s movement. At the same time, the experimental study was performed in an open space where participants could not get trapped by the robot. In addition, an observer holding a wireless emergency stop button was present at all times and responsible for stopping the robot immediately, should any unforeseen hazardous situation arise.

One objective of the study was to assess the three display versions in terms of user performance, by observing the extent to which displays enabled users to remain safe during collaboration. Another objective was to assess the subjective experience that each unimodal display, and the multimodal combination of both, elicited in users during collaboration with a physical robot. By conducting the study also for the combined audio-visual display, we aimed to understand which strengths and weaknesses of each modality-specific display remained in the multimodal version, and if the simultaneous presentation of all modality-specific information (which was partly redundant and partly complementary between modalities) resulted in a safer display for collaboration. Further, we wanted to understand how the experience of using the multimodal display compared to using each of its constituent unimodal displays alone.

4.1 Participants

We recruited 24 participants (12 male and 12 female) with a mean age of 28 years (SD = 5.87), \(CI_{95\%}\)[26.152, 30.847], who volunteered to take part in the study. 16 of the participants had some previous experience with augmented reality, 13 of which had experience with a HoloLens HMD device. Out of 24 just 5 of them had previous experience with collaborative robots. Participants were recruited from a population of researchers and applied engineers with various degrees of seniority in various different areas of technical knowledge.

4.2 Experimental task

We designed a three-condition within-groups repeated-measures study to compare the performance and UX obtained when collaborating with a robot by using each of the three hazard awareness display versions. A HRC task was designed to be completed in each condition: (i) visual condition (visual display only), (ii) audio condition (auditory display only), (iii) audio-visual condition (the simultaneous presentation of both audio and visual displays). The order in which conditions were presented to the participants was fully counterbalanced, and in each condition the task was repeated 3 times.

We set up an experimental HRC pick and place acti in the open space of an industrial facility, where noise from machinery that was operating in the same facility could be heard in the background. This choice of environment was representative of an actual industrial facility in which workers (target users of our solution) and robots typically cooperate in real production scenarios. For the robot used in this study, we chose an LBR Iiwa robot arm, that was securely affixed to a table for its task. An inspection task was selected for the study, representative of tasks commonly found in industrial production Brito et al. (2020); Zheng et al. (2021). More specifically, the selected task required the robot to inspect specific documents (see Fig. 3).

To evaluate the hazard awareness displays, we wanted a task in which the user and the robot had to occupy the same shared space at various times during the execution of the cooperative task.

For that, we devised a task for participant and robot to cooperate. The task consisted of placing a numbered sheet in its corresponding position in the stack of numbered pages that formed an unbound book, so the robot could inspect the whole book, page by page. This task aligns with relevant industrial activities, such as the need to coordinate with robots and ensure human accuracy in placing items correctly within a limited time frame. The precision required in this task can be also generalized to various industrial applications, including assembling electronics, automotive parts, and packaging. This generalizability makes it a valuable model for studying and improving HRC across multiple sectors.

First, the participant was located at a separate workstation from the robot’s table, executing a distractor task that involved performing mental arithmetic calculations (see Fig. 3I). This approach was used to minimize any mental practice that might lead to learning or the development of strategies for subsequent repetitions Schmidt and Bjork (1992). After a waiting period, the robot requested the participant’s intervention by playing a sound and displaying a yellow circle in the upper left side of the HoloLens display. Such request took place three times in each condition.

Upon receiving the request, the participant moved to the robot’s table to take a numbered sheet. They then proceeded to the other side of the robot’s table where a stack of numbered pages was placed and inserted the sheet into the corresponding location. The user had to complete the insertion of each page with the destination document on the table, which was shared with the robot. The participant was not allowed to remove the document from the table and return it after inserting the loose sheet. Instead, the participant had to carry out the insertion while the robot was operating on the opposite side of the table and had to pause the insertion task to avoid the robot, when robot approached their vicinity. Once a loose sheet was inserted in its correct position, the participant had to return to the individual workplace, further away from the robot’s table, press an orange virtual end button, and resume the distractor task of writing down arithmetic calculations, until the robot requested the participant’s help again.

The task was the same for the three conditions (audio, visual and audio-visual). It also remained the same for all the repetitions in each condition: pick a sheet and place it in the corresponding location following the numerical order of the sheets. The location on the table on which the book was placed was selected randomly, but ensuring partially counterbalanced exposure, for each condition from among three options (see positions T1, T2, T3 in Fig. 3 III), without repeating the location for each participant. Each location of the book on the table also determined where the participant would take (one-by-one) the sheets from. The location of the book also determined the route along which the robot moved in a repeating cycle during the task. Movement routes of the robot were designed so that the robot interfered with the participant in the execution of the task. Thus, and referring to Fig. 3, with the book in T1, the participant picked the sheets from location T2, and the robot followed cyclically the trajectory between the points P11 to P18. With the book in T2, the participant picked the sheets from location T1, and the robot followed cyclically the trajectory between points P21 to P27. As for when the book was in T3, the participant picked the sheets from location T2, and the robot followed cyclically the trajectory between points P31 to P39. When deciding locations and robot trajectories, we analyzed the workspace to create 3 variants of the task that were equivalent in difficulty.

We opted for pre-defined and pre-sampled trajectories across all subjects to ensure a more balanced and controlled analysis. This approach aimed to assess how effectively each modality (audio, visual, or audio-visual) helped to maintain user safety without introducing external factors, such as dynamic changes in robot trajectories, which might interfere with the control of the conditions. By keeping the trajectories consistent through all the users, we ensured that any observed differences in user safety could be attributed to the feedback modality rather than the variability of the robot’s movements. Also it has to be noted that users were not notified regarding that the trajectories were pre-defined.

Table 1 Threshold values used for gamification scoring, categorized by speed (time execution time), staying safe (score that depends on the number of times entering high level hazard zones), and correctness (how well task has been performed)

Full size table

The task was gamified to encourage effort from the participants in executing the task as well as possible according to the instructions. Thus, three aspects about the execution of the task (speed, staying safe and correctness) were measured or evaluated, resulting in a numerical score of to up to 15 points (up to 5 points for each aspect). The aim was that users obtained the closest mark to 15 as possible. Time to complete the task was taken as measure of speed, where depending on the time spent they obtained a score from 1-5 (see Table 1, Speed column). As a measure of staying safe, each intrusion in the orange hazard region was counted as a point for entry scoring, and as 2 when the red region was entered. The smaller the entry scoring, the bigger the score they obtained in this aspect, with a maximum of 5 points (see Table 1, Staying Safe column). In addition, the observer rated numerically how correctly the instructions had been followed in keeping the book on the robot’s table during the insertion of loose sheets. Again here the best value was obtaining a 5 (see Table 1, Correctness column). Each participant was informed about the scores obtained after the three conditions were completed.

4.3 Experimental procedure

Each participant read and signed a consent form and filled out a demographic questionnaire before the experimental session started. Having read a description of the task, every participant received a training session in and practiced executing the task with feedback from each of the display versions. For safety reasons, the training session was conducted by interacting with the virtual representation of the robot’s digital twin, not with the real robot as during the execution of the task for the study. During the training session, the robot’s digital twin was placed on the table, where the real robot would later be placed throughout the experiment.

After the training session, the scenario was prepared to perform the study. To implement the interaction and the responses designed for the displays, the real robot and its digital twin had to be present, with the digital twin perfectly superimposed on the real robot. The presence of the virtual replica of the real robot was only necessary to detect proximity with the user that was wearing the HoloLens device. Thus, once the position of the virtual replica was calibrated with the real robot, with both versions of the robot moving together, the proximity of the user with the digital robot was also indicating the proximity of the user with the physical robot. However, since from the user’s perspective the scene only included one physical robot, the virtual replica of the robot was made invisible for the participants, who only saw the physical robot. In this way, participants could observe the hazard awareness information over the real robot (see Fig. 4). For a live demonstration of the designs (audio, visual and audio-visual design) in a real environment, and the training session, please refer to the video (https://youtu.be/JvLUrciz2hM).

Regarding the volume of sound of the HoloLens speakers used for the training and experimental tasks, it was standardized for all participants to ensure consistency. The volume was set at a level that was sufficient for all users to clearly hear the auditory display without exceeding 80 decibels (dB), in line with the World Health Organization (WHO) ^{Footnote 1} recommendations for safe listening levels. This setup ensured that all participants could hear the auditory cues clearly while also maintaining their hearing safety.

4.4 Metrics

After each condition, the participants responded to two post-task questionnaires: a Single Ease Question (SEQ) using a 7-point Likert scale, where 1 represents the worst value and 7 represents the best value Sauro (2012), and a raw NASA-TLX (RTLX) questionnaire Hart (2006), extended with the category “irritability”. The generalized practice of adding subscale categories is acknowledged and welcomed by the creators of the original scale Hart (2006), as long as the index is calculated with the originally validated set of subscales, as we do. As highlighted by Haas et al. Haas and Edworthy (1996), the design of auditory warnings, plays a critical role in conveying urgency and can significantly influence user perception and emotional state. Adding irritability as a factor allows for a more comprehensive assessment of user experience and mental workload, accounting for the potential emotional strain caused by auditory stimuli. The NASA-TLX uses a scale with 21 gradations where 1 is the most positive value and 21 is the most negative, except for the Performance scale that is framed positively ^{Footnote 2} and hence values are reversed. At the end of the experimental session, participants were asked to select the best and the worst hazard awareness displays (the auditory, the visual or the audio-visual display) with respect to the following categories: pleasantness, capacity to capture attention, capacity to provide sense of safety, capacity to convey sense of distance to origin of hazard, and preference. The selections made by every participant, about the best and worst hazard awareness displays, with respect to the categories, were used to initiate discussion in the semi-structured interview that followed, with the aim of obtaining further insights from participants regarding their experience during the three conditions. We aimed to analyze aspects such as: How they perceived their safety, Whether the danger was correctly conveyed, The overall experience they had in each condition and Any other design improvements they suggested. These questions provide us with qualitative insights that enhance and complement the quantitative metrics obtained during the user study. This helps us identify any issues or positive aspects that the users felt during the study, which may not be fully captured by the quantitative data alone. This qualitative feedback allows us to contextualize the metrics, ensuring a more comprehensive evaluation and refinement of the system. The interview was enhanced by discussing the participant’s responses provided in the extended RTLX questionnaire. The interviews were recorded and subsequently transcribed for further analysis. The interviews were recorded and subsequently transcribed for further analysis. The questionnaires were also transcribed and analyzed using the methodology of affinity diagramming, as described in Lucero (2015).

During the execution of the conditions and the post-task questionnaires, two experimenters were involved. One of them observed and interacted with users throughout the entire process, while the other was solely an observer. Both experimenters participated in the development of the data analysis and the creation of the affinity diagram.

5 Results

We analyzed both quantitative and qualitative data collected during the study. Quantitative results included objective observed data and quantified subjective data obtained through questionnaires. Qualitative data were obtained from the semi-structured interviews conducted at the end of each study session.

For the analysis of quantitative data, we used effect size (ES) estimation techniques, specifically Cohen’s d, along with 95% confidence intervals (CI), instead of using null-hypothesis significance testing (NHST). Cohen’s d quantifies the size of the difference between means, offering a more informative basis for evidence accumulation. Additionally, the use of CI 95%, which provides a range of values within which the true effect size is likely to be contained, offering a clearer picture of the precision and reliability of the estimated effect. CIs allow for a more informed interpretation of the results by providing a more informative basis for evidence accumulation. Cohen’s d provides a standardized measure of effect size by quantifying the difference between two group means in terms of standard deviations. The interpretation of Cohen’s d values is typically guided by the following thresholds: a small effect size is indicated by a|d| of 0.2 or less, reflecting a small difference between groups; a small to medium effect size ranges from 0.2 to 0.5, suggesting a more pronounced difference that is not yet medium; a medium effect size, ranging from 0.5 to 0.8, indicates a moderate and practically meaningful difference; and a large effect size is represented by a d of 0.8 or more, highlighting a substantial and significant difference between groups. These thresholds are based on conventions established by Jacob Cohen Cohen (2013, 1992) and are widely used to contextualize research findings, enabling researchers to assess the strength of observed effects. In this way, readers can extract their own critical conclusions, as currently recommended for user studies in disciplines of human-robot interaction Cumming (2014); Dragicevic (2016).

Table 2 Numerical values of means, standard deviation (between parentheses) and confidence intervals corresponding to quantitative data for each condition. Values represent 95% confidence intervals

Full size table

Table 3 Cohen’s d value of quantitative data between conditions

Full size table

5.1 Observed quantitative results

On Fig. 5 all the data related to time are shown, measured during the experiments. In the top row, time spent inside the different hazardous levels is summarized. It was observed that, for every level of danger, there was almost no difference between conditions in the total average time spent in them (see Time in Yellow, Time in Orange and Time in Red columns in Table 2). It can be observed that, in every condition, participants spent the bulk of the time in yellow and in orange hazard zones, with less time spent in orange zones. If we take a look to the cohen’s d shown in Table 3, in yellow and red area there is no difference in ES, since all|d| values are smaller than 0.2. In orange area, the effect size between audio and visual was observed to be negligible too, having a value of 0.0065. However, both audio vs audio-Visual (d = \(-\)0.2279) and visual vs audio-visual (d = \(-\)0.2174) showed small negative effects, indicating that the audio and visual conditions involved slightly less time spent in orange zones compared to the audio-visual condition. The mean time spent in red zones was almost 0 s in every condition. In terms of time needed to perform the task, and percentage of time performing the task inside hazardous spaces (see Time Performing a Task column and % of Task Time in Danger column in Table 2), we could also observe that there was almost no difference in terms of these two aspects between modalities, also confirmed by the cohen’s d values of Table 3. As shown, the percentage of time spent inside the hazardous area was greater in all modalities above 50%. As explained before, to perform the task, the users had to enter inside the reachable space of the robot, therefore entering the hazardous space. But as observed in Table 2, users spent most of the time inside the yellow area (low level of hazard). Regarding the total time spent inside hazardous areas while performing a task, only small differences were observed between conditions, although the average time in the audio-visual condition (28.76 s (SD = 17.10), \(CI_{95\%}\)[24.81, 32.71]) was slightly larger than in the audio (25.65 s (SD = 12,91), \(CI_{95\%}\)[22.66, 28.63]) and visual (27.0 s (SD = 14.33), \(CI_{95\%}\)[23.69, 30.31]) conditions. The Cohen’s d values further indicate that the time spent in hazardous areas in the audio condition is slightly less than in the audio-visual condition, with a value of \(-\)0.2054, suggesting a small/medium effect size.

5.2 Questionnaire results

Figure 6 summarizes the answers provided for the SEQ questionnaire, rating in a 7 point Likert scale how easy it was to perform the task in each condition Sauro (2012). On average, in all the modalities, the task execution was indicated to be easy, with small differences in the mean values and with mostly overlapping confidence intervals. The average ease rating with the audio-visual display (5.79 (SD = 1.32), \(CI_{95\%}\)[5.26,6.31]) was slightly lower than with the audio display (6 (SD = 0,93), \(CI_{95\%}\)[5.62,6.37]) and the visual display (6 (SD = 0.78), \(CI_{95\%}\)[5.68,6.31]). This is also confirmed by the Cohen’s d, as none of them was over the 0.2 value (audio vs visual: d = 0; audio vs audio-visual: d = 0.1825; visual vs audio-visual: d = 0.1924 ). Thus, the type or amount of hazard information provided did not seem to affect how easy the task appeared to be.

Table 4 Results from the extended raw NASA-TLX questionnaire: average values, with standard deviation and 95% confidence intervals

Full size table

Table 5 Cohen’s d results from the extended raw NASA-TLX questionnaire

Full size table

In analyzing the extended raw NASA-TLX questionnaire, irritability (the extending category) was not used to calculate the RTLX index. As shown graphically in Fig. 7 and numerically in Table 4, the differences in mean values observed between conditions were very small in the 6 categories that form the questionnaire, and in the RTLX index itself. However, despite the large overlaps between modalities, some small to medium differences in ES were observed in Table 5. There are small to medium differences in performance comparing audio vs visual (d = \(-\)0.2166) and similar values were observed in visual vs audio-visual (d = 0.2455). In the frustration category, the same range of ES differences was observed in visual vs audio-visual condition (d = \(-\)0.2054). In the irritability category, small to medium ES differences are again noted in audio vs audio-visual (d = \(-\)0.2263) and visual vs audio-visual (d = \(-\)0.2462). Although this category was added because auditory displays are often found to be irritating, this was not evident when compared with the visual display condition. Finally, in the RTLX index, a small to medium difference is also observed in the visual vs audio-visual comparison (d = \(-\)0.2310).

Results from the analysis of the post-study questionnaire administered at the end of the experiment are shown in Fig. 8, in which users were asked to select the best and worst condition in the categories pleasantness, capture of attention, sense of safety, perception of distance to danger origin, and preference.

In the pleasantness category, the auditory display gathered most of the negative votes, which was expected from the literature on auditory displays. This was also the reason why we added an Irritability category to the NASA-TLX questionnaire, although no difference between displays was observed there (Fig. 7).

In all the other categories, the audio-visual display received most positive votes, well ahead of the single modality displays in receiving positive votes, in every case. For these categories, negative votes were quite equally obtained by the single modality displays. An exception was the capture-of-attention category, in which the visual-only display was thought to be the absolute worst. However, in the sense-of-safety category the auditory display received more votes than the visual for being the worst.

5.3 Interview results

Regarding the semi-structured interviews conducted at the end of each participant’s session, all interviews were first transcribed, and then we initiated the process of organizing user comments using post-it notes, following the methodology outlined in Lucero (2015).

Each post-it note was formatted as follows:

Top left corner User number.
Top right corner Possible Modalities and trajectory used for the modality.
Middle User’s comment.
Bottom right corner Number of the post-it.

In each post it was added the comment of the user, the user number, marked the modality the user made the comment of, the task performed in that modality and the number of post-it (see Fig. 9, b)). Once all the comments were transcribed onto the post-it notes, a team of three people read through the different post-its individually. After the initial reading, the team start discussing post-it by post-it and started to identify potential clustering. This iterative process of reading and discussion was repeated multiple times, ensuring thorough engagement with the data.

During these discussions, the team worked collaboratively to categorize the post-it notes into broader classes and subclasses. We further refined all classes by indicating whether the comments were positive or negative. In instances where a comment was relevant to more than one class, the corresponding post-it note was duplicated (with the duplicate retaining the original numbering) and placed in the appropriate additional classes. This whole process took 7 days and a total of 21 h.

Once all team members agreed with the distribution and clustering of the post-its (see Fig. 9,a)), they were transcribed digitally using Excel. This digital transcription facilitated further analysis and visualization. When all the data was entered into Excel, it was transformed into a graph to enhance the readability of the results.

The graph affinity diagram is shown in Fig. 10. We identified 5 main groups in which to classify all the interview data: Description of Danger, Safety Perception, Experience and Perception, Versatility and Design Features. These main groups were divided into different characteristics shown in Fig. 10.

The transcription of all participant comments produced 173 post-it notes, which were most successfully organized in the 5 branch categories just mentioned, which contained 14 leaf categories in total. The following is an outline of each leaf, from left to right:

Complete information Comments on how complete the information received was (“The information of the hazard is more complete” P24).
Change of status Awareness of transitioning between different levels of hazard (“The visual display helps to identify a change in the level of danger I am exposed to” P1).
Direction of danger Awareness of the direction of the source of danger, with respect to the participant (“You can understand the direction and distance to danger” P12).
Distance to center of danger Awareness about how far the center of the danger was (“You can understand the direction and distance to danger” P12).
Safety perception Comments about perceived safety (“I felt more safe with audio modality P11).
Time pressure Experiencing urgency to complete the task (“It pressures me to make the task fast” P20).
Mental demand Comments about how mentally demanding the task was (“It provides too much information P3).
Irritability Comments about displays being irritating (“The audio is irritating” P17).
Capture attention Comments about the ability of a display to capture attention (“It is irritating but it captures your attention” P24).
Pleasantness Considerations about how pleasant a display was (“All of them are pleasant” P6).
Intuitiveness Considerations about how intuitive the interaction with a display was (“It is more simple and hence more intuitive” P22).
Help in movement Comments about a display being particularly helpful while the user was moving from one place to another in the shared workspace (“while walking, the shapes are very helpful” P6).
Multitasking Comments about the display providing support for the participant to simultaneously focus on task execution and monitoring of the robot (“It helps with safety, while you are concentrating on performing a task” P18).
Shapes Comments that mention the outside shape of the volumes for different hazard levels (only relevant for visual information) (“The shapes help to understand the direction of the hazard” P10).

More quotes from the participants, representative of the data collected from the interviews, are included in the discussion section.

6 Discussion

The core conclusion we obtained from this study is that all three modality conditions were suitable for a HRC scenario, since almost no differences are observed between values and ESs (the ones observed did not arrive to medium difference), although each display presented specific strengths, mostly impacting UX rather than performance.

In all three conditions the difficulty of the experimental task was rated as low (see Fig. 6) and no differences were observed between ESs. In the NASA-TLX categories, some small to medium differences in ESs were noted (see Table 5). Users reported slightly better performance in the visual condition compared to the audio only and audio-visual conditions. The audio-visual condition was found to be more frustrating than the visual condition and more irritating than both the audio and visual conditions. A worse performance and frustration appear to negatively impact the RTLX index, where the audio-visual condition shows a small to medium negative difference in ES compared to the visual condition indicating a worse task load index comparing to visual condition. Regarding time-related metrics (see Fig. 5) differences observed between conditions were also small. While being cautious not to read too far into these time-related results, the small differences seem to suggest that, with both visual and auditory information simultaneously on display (AV condition), participants dared to venture themselves further into higher-risk regions (working in closer proximity with the robot), and for longer time (e.g., time spent in the orange area). This observation is supported by the Cohen’s d value, where a small to medium difference was observed in the audio-visual condition compared to other conditions in the time spent in the orange area, indicating that users tended to spend more time inside the orange area under this condition. Rather than interpreting this as a worse performance for remaining in the safer yellow region during task execution, it might be understood that more complete awareness information (complementary and redundant from both sensory channels) provided the necessary reassurance and confidence for participants to remain closer to the robot, although this did not result in task completion time being lower. As for the highest risk situations, participants remained successfully away from the no-go red region with every display design, suggesting that all three conditions might be similarly effective in helping users remain at a safe distance from the robot.

The affinity diagram analysis of interview data allowed for a more nuanced understanding of the UX in the different conditions. Leaf categories such as “complete information” confirmed the previous idea, that participants consciously realized that they were better informed in the audio-visual condition (AV condition) (15/24 participants provided statements in this sense). In the “direction of danger” category, it is remarkable that this aspect of the situation awareness seemed to be boosted by the combination of both modalities, and the capacity of the display to capture attention was also perceived to be best in that same case. Determining the distance to the robot seemed to also benefit from the simultaneous perception of both modalities, as supported by statements such as “the audio-visual display helped me understand if the robot was approaching or leaving and understanding the state I was in with respect to it” (participant P23). This was also reflected in the post-study questionnaire, where most users voted positively regarding safety-related aspects conveyed by the audio-visual display (capture of attention, feeling of safety and distance to the center of danger). The audio-visual display was in fact voted as the most preferred display.

As a result, subjective perception of safety (the “Safety Perception” leaf category) was highest in the bimodal condition, with no negative aspects about it stated in the study.

Reportedly, the visual display on its own helped users to move in the shared workspace with respect to the also-moving robot (“Helps in movement” leave category). The singular characteristic of the visual display that provided this assistance was the visible shape of each hazard zone. Summed in the “Shapes” leaf category, 4/24 participants provided positive statements, such as “while walking, the shapes are very helpful” (P6). This could be because the collaboration took place in a shared space in which the relative distance between robot and human (which is the source of a potential hazard for the human) varied when either, or both, moved. The visible shapes of the hazard zones were then seen in changing perspective, which made their extent in space easier to understand. While this visual shape feature was also present in the multimodal condition, participants referred to it only in relation to the single modality visual display, possibly because it filled in some cognitive gap left by the absence of special auditory feedback. Although most data seemed to indicate a superiority of the multimodal display, some few participants (3/24) boldly praised the simplicity and intuitiveness of the visual-only display, e.g., “the visual display is more simple and easier to understand” (P18). This simplicity might have also helped reduce mental demand, which appeared to be marginally lower in average in the corresponding NASA-TLX category (Fig. 7). However, distinct modality displays provided the strength of complementary in situations such as when the user was static and the robot moving: the auditory information appeared to help with capture of attention (Fig. 8, and also the “Capture attention” leaf category in Fig. 10), and the visual display was of assistance in understanding the level of danger (the “Change of status” leaf category in Fig. 10).

As for the auditory display, it was found to be the most unpleasant of the displays, both in the post-study questionnaire (Fig. 8) and in statements from the interviews (“Pleasantness” leaf category, Fig. 10). 13/24 participants made comments such as “this display is unpleasant” (P20). However, this perception did not translate with any clarity in the scores of the “Irritability” category introduced as extension in the NASA-TLX questionnaire (Fig. 7). In this category, the audio-visual condition showed a small to medium effect size difference compared to the other two conditions, indicating that participants might have found the audio-visual condition to be more irritable. The mild unpleasantness quality of audio is thus thought to contribute to capturing attention efficiently. 16 out of the 24 participants spoke positively about these aspects of the auditory display, where 11 of these 16 users indicated that the display was unpleasant. From these 16 participants, 13 commented that the audio-visual display caught attention, which might be due to the auditory display: “it catches attention better than the visual display” (P15). While the auditory display seemed to be superior to the visual display at capturing attention (expressed clearly by participants in the post-study questionnaire), a minority of participants (4/24) also attributed this capacity to the visual display in their interview statements, suggesting that personal preference might also play a role: “visual captures attention better than in audio” (P16). Another strength of auditory information (probably linked to its attention-grabbing capacity) was the assistance it offered for multitasking, meaning that the participant could focus on executing the task, while also monitoring the changing proximity with the robot. 8/24 participants provided statements supporting this notion, such as “it helps with safety, while you are concentrating on performing a task” (P18).

7 Design and implementation limitations

One of the limitations comes from constraints in current technology to recreate an ecologically valid visual field. Although HoloLens 2 has a wider field of view (54°) than its preceding version (34°), it is still much narrower than the field of view of human vision. For this reason, we used the swiveling-arrow-based visual aid as a strategy, with limited compensatory effect.

Regarding the performance capacity of HoloLens 2, a few limitations determined the way of calculating the shapes’ values. HoloLens 2 ^{Footnote 3} has a Qualcomm Snapdragon 850 Compute Platform CPU with 4-GB LPDDR4x system DRAM of memory. In terms of software, it already has functionalities built in for human understanding such as hand tracking, eye tracking and voice command recognition. These functions consume processing resources of HoloLens. This, together with the rendering of colored shapes for hazard representation in the visual display, results in occasional saturation. For this reason, some calculations had to be simplified, such as using the actual position and target position of the robot for the calculation of the orange shape, instead of computing the area with all the points of the planned path.

The resolution of the last two constraints discussed is dependent on the advancement of the technology used in future HMD devices.

Additionally, while the use of pre-defined and consistent trajectories was beneficial for maintaining balance and control in the evaluation, it also represents a limitation of this study. Collaborative robotics can also involve dynamic and adaptive trajectories, which were not considered in our experimental setup. Future research should incorporate dynamic trajectories to evaluate the effectiveness of feedback modalities in scenarios where such trajectories are required, providing a more comprehensive assessment of their impact on safety and interaction.

Finally, for detecting collisions with different levels of hazard, we have used a framework which creates 3D bounding boxes around users’ head, body and hands. However, the body bounding box moved accordingly to the participant’s head and not with real body, since HoloLens only enables the possibility to track hands and head (depicted in Fig. 2II). Although the system was correctly detecting all collisions, for more dangerous scenarios a more detailed body tracking might be needed.

8 Conclusion and future work

Motivated by the increasing use of fenceless robots and cobots, we wanted to develop displays capable of providing awareness about potential hazards that working alongside a robot could happen. In collaborative scenarios in which both robot and human worker moved in the shared space, the goal was that the user was adequately informed to decide confidently about their positioning with respect to the robot, minimizing exposure to hazard without the robot having to reduce the efficiency of its task execution.

The performance of the participants in the study, in terms of remaining in safe proximity with the robot during collaboration, was found to be similar with either of the displays. Strengths of each display design were identified through the user study, which modulated the UX obtained by the user.

The multimodal version of the display, presenting awareness information simultaneously through the spatial auditory and through the visual channels, resulted in the most effective hazard awareness display, as well as the preferred one. The triangulation of quantitative results with qualitative results from the study supports the notion that more complete awareness information (complementary and redundant from both sensory channels) conveyed reassurance for participants to remain closer to the robot during collaboration. Such a combination of information from different modalities helped participants focus on the collaborative task while remaining aware of the level of hazard they were exposed to at each moment. Results suggest that participants in the study were aware of their changing relative position with respect to the robot, in terms of relative direction and distance. Alongside the best perception of safety provided by the multimodal display, saturation of information was not found to be an issue most of the time, although a minority of participants preferred the simplicity of single modality displays.

Changing contextual and environment conditions in the collaborative scenario (including lighting and noise), together with the diversity of profiles of human workers (with different preferences and perceptual capacities) may require the reliance of situation awareness solely on the information from one of the sensory channels. In that sense, the study showed that the visual-only display was helpful for users to make quick decisions about how to move around the robot or position themselves with respect to it, when either one or both agents were moving in the shared space. The specific feature of the visual display that seemed to help with this was the visible color cast shape that enveloped the robot and evolved dynamically as the robot executed its trajectories. The visual information was also the most effective to signal the transition between regions with a different level of hazard in the scale of three discrete levels used.

As for the auditory-only display, its main strength was observed in its capacity to capture the attention of the user. Responsible for the capacity of the auditory signals to be noticed while the user was focusing on the task was probably their mildly unpleasant quality. Although results did not indicate that such unpleasantness might be excessive, acceptance by users to listen to the auditory display over the long term is an important safety consideration, and careful tweaking of auditory display parameters should be tried to strike an optimal balance between effectiveness and acceptance of the auditory hazard awareness display.

A final important consideration must be made regarding the suitability and acceptability (e.g., from a safety regulations point of view) of granting the end human user control over the relative position and movement between worker and robot. We propose that good conscious awareness of the relative situation with respect to a robot and its actions opens up a desirable avenue for human workers to have the control about how collaboration should evolve. However, we are very aware that this most sensitive question of safety at work requires that wrong judgements by the worker can never lead to any situation that might result in any harm. Thus, beyond the motivation of reaching the best UX in HRC, external systems should still ultimately ensure safety.^{Footnote 4}^{Footnote 5}

Data availability

Data is available upon request.

Notes

References

Aivaliotis S, Lotsaris K, Gkournelos C, Fourtakas N, Koukas S, Kousi N, Makris S (2023) An augmented reality software suite enabling seamless human robot interaction. Int J Comput Integr Manuf 36(1):3–29
Article Google Scholar
Bagchi S, Marvel JA, et al (2018) Towards augmented reality interfaces for human-robot interaction in manufacturing environments. In: Proceedings of the 1st international workshop on virtual, augmented, and mixed reality for HRI (VAM-HRI)
Bdiwi M, Pfeifer M, Sterzing A (2017) A new strategy for ensuring human safety during various levels of interaction with industrial robots. CIRP Ann 66(1):453–456
Article Google Scholar
Blanco-Novoa O, Fernandez-Carames TM, Fraga-Lamas P, Vilar-Montesinos MA (2018) A practical evaluation of commercial industrial augmented reality systems in an industry 4.0 shipyard. IEEE Access 6:8201–8218
Article MATH Google Scholar
Bolano G, Roennau A, Dillmann R, Transparent robot behavior by adding intuitive visual and acoustic feedback to motion replanning. In: 2018 27th RO-MAN, pp. 1075–1080 (2018). IEEE
Bolstad C, Costello A, Endsley M (2006) Bad situation awareness designs: what went wrong and why. In: Proceedings of the 16th world congress of international ergonomics association
Brending S, Khan AM, Lawo M, Müller M, Zeising P, Reducing anxiety while interacting with industrial robots. In: Proceedings of the 2016 ACM international symposium on wearable computers, pp. 54–55 (2016)
Brito T, Queiroz J, Piardi L, Fernandes LA, Lima J, Leitão P (2020) A machine learning approach for collaborative robot smart manufacturing inspection for quality control systems. Procedia Manuf 51:11–18
Article MATH Google Scholar
Bruner J, Kisgergely B (2019) Digital transformation in manufacturing. https://3d.formlabs.com/rs/060-UIG-504/images/The-Digital-Factory-Report.pdf
Cleaver A, Tang DV, Chen V, Short ES, Sinapov J (2021) Dynamic path visualization for human-robot collaboration. In: Companion of the 2021 ACM/IEEE international conference on human-robot interaction, pp. 339–343
Cobots for Flexible Automation. Technical report, ABIresearch (2021). https://www.abiresearch.com/market-research/product/1027368-cobots-for-flexible-automation/
Cohen J (1992) A power primer. Psychol Bull 112(1):155–159. https://doi.org/10.1037/0033-2909.112.1.155
Article MATH Google Scholar
Cohen J, Statistical power analysis for the behavioral sciences. routledge,??? (2013)
Cong Hoang K, Chan WP, Lay S, Cosgun A, Croft E (2021) Virtual barriers in augmented reality for safe and effective human-robot cooperation in manufacturing. arXiv e-prints, 2104
Cumming G (2014) The new statistics: why and how. Psychol Sci 25(1):7–29
Article MATH Google Scholar
De Pace F, Manuri F, Sanna A (2018) Augmented reality in industry 4.0. Am J Comput Sci Inf Technol 6(1):17
MATH Google Scholar
Doshi A, Smith RT, Thomas BH, Bouras C (2017) Use of projector based augmented reality to improve manual spot-welding precision and accuracy for automotive manufacturing. Int J Adv Manuf Technol 89(5):1279–1293
Article MATH Google Scholar
Dragicevic P, Fair statistical communication in hci. Modern statistical methods for HCI, 291–330 (2016)
Edworthy J, Loxley S, Dennis I (1991) Improving auditory warning design: relationship between warning sound parameters and perceived urgency. Hum Factors 33(2):205–231
Article Google Scholar
Edworthy J, Hellier E (2006) Alarms and human behaviour: implications for medical alarms. BJA: British Journal of Anaesthesia 97(1), 12–17
Evans G, Miller J, Pena MI, MacAllister A, Winer E, Evaluating the microsoft hololens through an augmented reality assembly application. In: Degraded environments: sensing, processing, and display 2017, vol. 10197, pp. 282–297 (2017). SPIE
Giang W, Burns CM (2012) Sonification discriminability and perceived urgency. In: Proceedings of the human factors and ergonomics society annual meeting, vol. 56, pp. 1298–1302. Sage Publications Sage CA: Los Angeles, CA
Green SA, Billinghurst M, Chen X, Chase JG (2007) Human robot collaboration: an augmented reality approach-a literature review and analysis. In: International design engineering technical conferences and computers and information in engineering conference, vol. 48051, pp. 117–126
Gruenefeld U, Prädel L, Illing J, Stratmann T, Drolshagen S, Pfingsthorn M, Mind the arm: realtime visualization of robot motion intent in head-mounted augmented reality. In: Proceedings of mensch und computer 2020, pp. 259–266. ACM,??? (2020)
Haas EC, Edworthy J (1996) Designing urgency into auditory warnings using pitch, speed and loudness. Comput Control Eng J 7(4):193–198
Article MATH Google Scholar
Hart, S.G.: Nasa-task load index (nasa-tlx); 20 years later. In: Proceedings of the human factors and ergonomics society annual meeting, vol. 50, pp. 904–908 (2006). Sage publications Sage CA: Los Angeles, CA
Hietanen A, Pieters R, Lanz M, Latokartano J, Kämäräinen J-K (2020) Ar-based interaction for human-robot collaborative manufacturing. Robot Comput-Integ Manuf 63:101891
Article Google Scholar
Lasota PA, Shah JA (2015) Analyzing the effects of human-aware motion planning on close-proximity human-robot collaboration. Hum Factors 57(1):21–33
Article MATH Google Scholar
Li X, Yi W, Chi H-L, Wang X, Chan AP (2018) A critical review of virtual and augmented reality (vr/ar) applications in construction safety. Autom Constr 86:150–162
Article MATH Google Scholar
Lohse M (2009) The role of expectations in hri. New Front Human-Robot Interact, 35–56
Lucero, A, Using affinity diagrams to evaluate interactive prototypes. In: IFIP conference on human-computer interaction, pp. 231–248 (2015). Springer
Macciò S, Carfì A, Mastrogiovanni F (2022) Mixed reality as communication medium for human-robot collaboration. In: 2022 International conference on robotics and automation (ICRA), pp. 2796–2802. IEEE
Makris S, Karagiannis P, Koukas S, Matthaiakis A-S (2016) Augmented reality system for operator support in human-robot collaborative assembly. CIRP Ann 65(1):61–64
Article Google Scholar
Makris S, Mourtzis D, Chryssolouris G: In: Laperrière L, Reinhart G (eds.) Computer-aided manufacturing, pp. 254–266. Springer, Berlin, Heidelberg (2014). https://doi.org/10.1007/978-3-642-20617-7_6550
Malm T, Salmi T, Marstio I, Aaltonen I (2019) Are collaborative robots safe? In: Automaatiopäivät23, pp. 110–117. Finnish Society of Automation
Matsas E, Vosniakos G-C (2017) Design of a virtual reality training system for human-robot collaboration in manufacturing tasks. Int J Interact Design Manuf (IJIDeM) 11(2):139–153
Article MATH Google Scholar
Matsas E, Vosniakos G-C, Batras D (2018) Prototyping proactive and adaptive techniques for human-robot collaboration in manufacturing using virtual reality. Robot Comput-Integ Manuf 50:168–180
Article Google Scholar
Matthias B (2015) ISO/TS 15066 - Collaborative Robots - Present Status
Montuwy A, Cahour B, Dommes A. Visual, auditory and haptic navigation feedbacks among older pedestrians. In: Proceedings of the 19th international conference on human-computer interaction with mobile devices and services, pp. 1–8 (2017)
Oken BS, Chamine I, Wakeland W (2015) A systems approach to stress, stressors and resilience in humans. Behav Brain Res 282:144–154
Article MATH Google Scholar
Onal E, Craddock C, Endsley M, Chapman A (2013) From theory to practice: how designing for situation awareness can transform confusing, overloaded shovel operator interfaces, reduce costs, and increase safety. In: ISARC. Proceedings of the international symposium on automation and robotics in construction, vol. 30, p. 1. IAARC Publications
Palmarini R, Amo IF, Bertolino G, Dini G, Erkoyuncu JA, Roy R, Farnsworth M (2018) Designing an ar interface to improve trust in human-robots collaboration. Procedia CIRP 70:350–355
Article Google Scholar
Pan Z, Polden J, Larkin N, Van Duin S, Norrish J (2012) Recent progress on programming methods for industrial robots. Robot Comput-Integr Manuf 28(2):87–94
Article Google Scholar
Park S, Bokijonov S, Choi Y (2021) Review of microsoft hololens applications over the past five years. Appl Sci 11(16):7259
Article MATH Google Scholar
Reif R, Günthner WA (2009) Pick-by-vision: augmented reality supported order picking. Vis Comput 25(5):461–467
Article Google Scholar
Robots and robotic devices–collaborative robots. Standard, International organization for standardization, Geneva, CH (February 2016)
Robots and robotic devices–safety requirements for industrial robots–part 1: robots. Standard, international organization for standardization, Geneva, CH (July 2011)
Robots and robotic devices–safety requirements for industrial robots–part 2: robot systems and integration. Standard, international organization for standardization, Geneva, CH (July 2011)
Rosen E, Whitney D, Phillips E, Chien G, Tompkin J, Konidaris G, Tellex S (2019) Communicating and controlling robot arm motion intent through mixed-reality head-mounted displays. Int J Robot Res 38(12–13):1513–1526
Article Google Scholar
Rowen A, Grabowski M, Rancy J-P, Crane A (2019) Impacts of wearable augmented reality displays on operator performance, situation awareness, and communication in safety-critical systems. Appl Ergon 80:17–27
Article Google Scholar
San Martin A, Kildal J (2021) Audio-visual mixed reality representation of hazard zones for safe pedestrian navigation of a space. Interact Comput 33(3):311–329
Article MATH Google Scholar
San Martín A, Kildal J, Audio-visual ar to improve awareness of hazard zones around robots. In: Extended Abstracts of the 2019 CHI, pp. 1–6 (2019)
Sauro, J., 10 Things to know about the single ease question (SEQ). https://measuringu.com/seq10/. Accessed: 2022-07-08 (2012)
Schmidt RA, Bjork RA (1992) New conceptualizations of practice: common principles in three paradigms suggest new concepts for training. Psychol Sci 3(4):207–218
Article MATH Google Scholar
Suzuki R, Karim A, Xia T, Hedayati H, Marquardt N (2022) Augmented reality and robotics: a survey and taxonomy for ar-enhanced human-robot interaction and robotic interfaces. In: CHI conference on human factors in computing systems, pp. 1–33
Tsamis G, Chantziaras G, Giakoumis D, Kostavelis I, Kargakos A, Tsakiris A, Tzovaras D (2021) Intuitive and safe interaction in multi-user human robot collaboration environments through augmented reality displays. In: 2021 30th IEEE international conference on robot & human interactive communication (RO-MAN), pp. 520–526. IEEE
Vogel, C., Walter, C., Elkmann, N.: Space-time extension of the projection and camera-based technology dealing with high-frequency light interference in hrc applications. In: 2021 IEEE ICHMS, pp. 1–6 (2021). IEEE
Vogel C, Fritzsche M, Elkmann N, Safe human-robot cooperation with high-payload robots in industrial applications. In: 2016 11th ACM/IEEE International conference on human-robot interaction (HRI), pp. 529–530 (2016). IEEE
Vogel C, Poggendorf M, Walter C, Elkmann N, Towards safe physical human-robot collaboration: a projection-based safety system. In: 2011 IROS, pp. 3355–3360 (2011). IEEE
Vogel C, Walter C, Elkmann N, A projection-based sensor system for ensuring safety while grasping and transporting objects by an industrial robot. In: 2015 IEEE International symposium on robotics and intelligent sensors (IRIS), pp. 271–277 (2015). IEEE
Vogel C, Walter C, Elkmann N, A projection-based sensor system for safe physical human-robot collaboration. In: 2013 IROS, pp. 5359–5364 (2013). IEEE
Ware, C.: Information visualization: perception for design. Morgan Kaufmann,??? (2019)
Wen R, Tay W-L, Nguyen BP, Chng C-B, Chui C-K (2014) Hand gesture guided robot-assisted surgery based on a direct augmented reality interface. Comput Methods Programs Biomed 116(2):68–80
Article Google Scholar
Williams A (2018) How to pick the right cobot for your business. Robotics Business Review
Zheng H, Hongxing W, Tianpei Z, Bin Y (2021) The collaborative power inspection task allocation method of “unmanned aerial vehicle and operating vehicle’’. IEEE Access 9:62926–62934
Article Google Scholar
Zielinska, O.A., Wogalter, M.S., Mayhorn, C.B.: A perceptual analysis of standard safety, fluorescent, and neon colors. In: Proceedings of the human factors and ergonomics society annual meeting, vol. 58, pp. 1879–1883 (2014). SAGE Publications Sage CA: Los Angeles, CA
Zielinska O, Mayhorn C, Wogalter M (2017) Connoted hazard and perceived importance of fluorescent, neon, and standard safety colors. Appl Ergon 65:326–334
Article Google Scholar
Zobel GP (1998) Warning tone selection for a reverse parking aid system. In: Proceedings of the human factors and ergonomics society annual meeting, vol. 42, pp. 1242–1246. SAGE Publications Sage CA: Los Angeles, CA

Download references

Acknowledgements

This publication has been partially funded by the project “5R- Red Cervera de Tecnologías robóticas en fabricación inteligente”, contract number CER-20211007, under “Centros Tecnológicos de Excelencia Cervera” programme funded by “The Centre for the Development of Industrial Technology (CDTI)”.

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

The research for this paper has been partially funded by the project “5R- Red Cervera de Tecnologías robóticas en fabricación inteligente”.

Author information

Authors and Affiliations

Faculty of Informatics, UPV/EHU, Manuel Lardizabal pasealekua, 1, Gipuzkoa, 20018, Donostia-San Sebastian, Spain
Ane San Martin & Elena Lazkano
Department of Autonomous and Intelligent Systems, Tekniker, Iñaki Goenaga, 5, Gipuzkoa, 20600, Eibar, Spain
Ane San Martin & Johan Kildal

Authors

Ane San Martin
View author publications
You can also search for this author inPubMed Google Scholar
Johan Kildal
View author publications
You can also search for this author inPubMed Google Scholar
Elena Lazkano
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ane San Martin.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose. The authors have no Conflict of interest to declare that are relevant to the content of this article.

Ethical approval

The study was approved by the Institutional Review Board, Tekniker Ethics Committee (IRB202200002). The board reviewed that the design of the user study met ethical requirements, following guidance from the European Commission on Ethics and Social Science and Humanities. The board’s assessment focused on protecting participant wellbeing, ensuring informed consent, protecting personal data, and maintaining transparency throughout the research process. The study’s design was evaluated for potential ethical implications, with measures implemented to mitigate any identified risks. Additionally, the use of HoloLens devices in the study was inspected to ensure compliance with safety standards. The HoloLens speakers, which can produce sound levels up to 85 decibels (dB), were deemed safe for extended use without risking hearing damage, aligning with the Noise at Work Directive (Directive 2003/10/EC).This directive sets the upper exposure action value at 85 dB for daily or weekly exposure levels, ensuring the safety of participants in accordance with European noise exposure regulations. The commitment to ethical integrity underscores the study’s dedication to responsible and respectful research practices.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

San Martin, A., Kildal, J. & Lazkano, E. Mixed reality representation of hazard zones while collaborating with a robot: sense of control over own safety. Virtual Reality 29, 43 (2025). https://doi.org/10.1007/s10055-025-01107-2

Download citation

Received: 27 October 2023
Accepted: 17 January 2025
Published: 19 February 2025
DOI: https://doi.org/10.1007/s10055-025-01107-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Mixed reality representation of hazard zones while collaborating with a robot: sense of control over own safety

Abstract

Similar content being viewed by others

Level of Robot Autonomy and Information Aids in Human-Robot Interaction Affect Human Mental Workload – An Investigation in Virtual Reality

Augmenting the Human-Robot Communication Channel in Shared Task Environments

Human–Robot Collaboration Using Visual Cues for Communication

Explore related subjects

1 Introduction

2 Related work

3 Developing an awareness display for HRC

3.1 Design of the visual display

3.2 Design of the auditory display

3.3 Collision detection of user with hazard spaces

4 Modality comparison user study

4.1 Participants

4.2 Experimental task

4.3 Experimental procedure

4.4 Metrics

5 Results

5.1 Observed quantitative results

5.2 Questionnaire results

5.3 Interview results

6 Discussion

7 Design and implementation limitations

8 Conclusion and future work

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords