Keywords

1 Introduction

What constitutes a sensor-robot system is expanding. There are a growing number and variety of systems that allow people to see and act at a distance [1]. Despite the dizzying pace of innovation and deployment of these human-sensor-robot systems, there are few formally derived metrics to facilitate comparison across designs [2]. Researchers have observed the introduction of teleoperated sensor-robot systems in many different work domains [3]. The observations from these studies can be organized into three basic findings [1]:

  1. 1.

    Independent of sensor platform type, practitioners have difficulty understanding and acting in a remote environment once their platform has left their line of sight. They often must stop to consider whether their platform is capable of performing an action in its surrounding environment.

  2. 2.

    In response to this difficulty practitioners develop ad hoc operating procedures or kludges, through trial and error to help them carry out their goals. Practitioners adapt their behavior to fill shortcomings in the platform’s design.

  3. 3.

    While the kludges operators create are often effective, they are slow, error prone, and require a high degree of concentration to execute.

These findings suggest that operators are unable to perceive the relationship between the sensor-robot platform and the environment when operating at a distance. The uncertainty that operators experience while teleoperating a sensor-robot platform arises from two sources. First, uncertainty grows from a limited ability to understand the layout of the robot’s environment through sensors. Second, operators are unable to perceive the action capability of the robot through the robot’s sensor feeds. Judgments such as, what is traversable? what is reachable? or what is passable? are effortful and slow, even in ideal visual environments.

In perceptual psychology the concept of a fit between an actor’s capability and the surrounding environment is called an affordance. Traversability, reachability, passability are all examples of affordances. The observed difficulties in teleoperation can be framed in terms of difficulties perceiving affordances through the sensor system given current platform designs. Measuring a human-sensor system’s ability to perceive affordances should be a useful and diagnostic metric to compare the effectiveness of different platform designs.

The theory of affordances in Ecological Perception holds that organisms understand their physical surroundings in part by how the scale of that environment fits their body’s own capabilities [4]. For example, when trying to reach for an object you do not think about whether your arm can reach it. You immediately perceive the reachability of the object. Psychologists have documented humans’ ability to perceive the affordances around them and have created well-accepted experimental methods and measures [57]. The current study adapts a method from the psychophysics and perception literature [8] to measure a human-robot system’s ability to perceive an object’s reachability under several different visual conditions representative of current and proposed sensor-robot systems.

Study participants were presented with a camera feed from a simulated explosive ordinance disposal robot and asked whether or not the mechanical arm attached to the robot’s chassis could reach a target object. Using a within subjects design, participants were exposed to three conditions. Each condition presented different visual cues roughly simulating different types of operating environment. The data was analyzed by computing psychometric functions for each of the three conditions and using them to compare psychophysical performance across the three conditions.

2 Related Work

The following section will outline previous work on several topics, including previous studies measuring teleoperated sensor-robot system performance, the link between teleoperation and ecological perception, and how the perception of affordances is measured in the ecological perception literature.

2.1 Research on Robotic System Effectiveness

Existing research on the effectiveness of teleoperated systems can be broken into two large categories: observational field studies and staged world studies. Robotics field studies describe the practicalities of operating these platforms outside of the lab. The work of [9] represents the first time Unmanned Ground Vehicles (UGV) were deployed for an Urban Search and Rescue task in the field. This extensive report describes the effectiveness of the physical platforms in the difficult, broken terrain of the collapsed World Trade Center. In addition the report details the challenges operators had in controlling the platform and interacting with other groups of stakeholders. One of the co-authors, Dr. Robin Murphy, has built a large corpus of studies examining the challenges associated with operating robotic platforms in the field [3].

Staged world studies bring robotic platforms into the laboratory, adding a layer of control not available in the field. The National Institute of Standards and Technology (NIST) conducted several studies of unmanned ground vehicle effectiveness in an attempt to establish common metrics for their performance. These studies generally involve expert operators using a teleoperated robot to navigate obstacle courses of varying fidelity. These operators are tasked with finding targets in the environment such as simulated disaster victims [10]. Human-robot performance is gauged by metrics like task completion time, number of targets found, and operator mental workload.

One staged world study asked operators to use robotic platforms outfitted with a manipulator arm to pick up a cylinder and place it into a similarly sized aperture such that the cylinder would not fall back out. While the results of the study were not published, the authors observed expert operators struggling with this simple task. Operators would often spend minutes not moving the manipulator arm, but attempting to understand the position of the arm relative to the surrounding environment (see Fig. 1).

Fig. 1.
figure 1

Operator using a robotic platform with manipulator arm to pick up a cylinder and place it into a similarly sized aperture such that the cylinder would not fall back out in NIST test case. Performance was very poor in general and virtually impossible when the teleoperator could not directly see the robot arm. Note: the two images shown here are of two different robotic platforms.

None of the existing human-robot interaction metrics capture the difficulties observed in teleoperated sensor-robot systems. Perceiving a remote environment through sensors on robotic platforms challenges the teleoperator. Measuring these challenges reveals why current designs are limited and suggests new design directions.

2.2 A Perceptual Model of Human-Sensor System Interaction

The work of [1] has proposed shifting the way we think of teleoperated systems. Rather than designing a platform as an independent technological artifact, one can think of the robot as a stand in for a human operator’s own perceptual system in a remote environment. Thus the platform’s primary purpose becomes supporting the human’s ability to perceive the remote environment as quickly and effortlessly as possible. Any other goals, such as the physical manipulation of, or movement through, the remote environment can only be effective and efficient once this is accomplished. Reframing teleoperated systems as an extended perception opens up new design directions based on how humans perceive their immediate surroundings, and new ways to measure a design’s effectiveness.

One relevant perspective on visual perception is Gibson’s theory of Ecological Perception [11]. Within Ecological Perception the concept of an affordance describes the fit between the capabilities of an actor and the constraints of the surrounding environment. The theory of affordances also provides a solid theoretical and methodological foundation to anchor the study of human-sensor-robot system perception. Gibson posits that visually guided action in the world requires an understanding of the perceived match between the properties in the environment and the known properties of the acting system, as described by [4, 12]. Organisms navigating an environment apprehend these affordances continuously, accessing perceptually defined attributes of the environment in relation to their own capabilities.

The work of [6] explores the affordance of passability, comparing a participant’s shoulder width to the width of an aperture. Participants were highly proficient in judging not the absolute width of the aperture, but the relative fit between their own shoulder width and the width of the aperture. Similarly, several studies have measured the ability of participants to discriminate reachable objects from unreachable objects [7]. These studies show that the participants are quite accurate in making a perceptual judgment about the reachability of an object, but performance begins to degrade when participants are given time to ruminate about the object’s distance and its relation to arm length [5].

Many modern studies in the psychometric literature, and specifically the affordance literature, use a psychometric function as a compact was way to describe a participant’s performance. The function relates the proportion of positive responses to a stimulus level, with the slope of the function represents the participant’s precision and the midpoint, or 50 % point, represents the participant’s accuracy [13]. Other parameters can also be estimated including the participant’s guessing, or lapse, rate.

2.3 Measuring Affordances in Human-Sensor Systems

Several recent studies have investigated the ability of robotic system operators to perceive affordances in a remote environment. These studies both examined the perception of an aperture’s passability, building on the work of [6]. The work of [14] examined the differences in passability perception when operating a platform inside and outside line of sight. Unsurprisingly, participants were more accurate and consistent when operating the platform within line of sight. Additionally, [14] investigated the effects of camera height on passability perception, finding that both camera height and distance to an aperture effected participants’ perception of passability.

Inspired by the work of [6, 9, 15] investigated how difficulties in perception can contribute to Murphy’s observation that platforms would frequently become stuck in apertures during urban search and rescue. Results showed that comparing solely the width of the platform to the width of the aperture was insufficient to ensure good driving performance. Rather, consistent with [6], the operator needed compare the aperture to the width of the platform plus a safety margin.

3 Experiment

The current study examined participants’ ability to perceive the reachability of a target at different distances from a virtual robotic platform featuring a mechanical arm. During each trial participants were shown a feed from the robot’s camera and responded with a binary reachable/unreachable answer in an adaptive psychophysical procedure.

3.1 Participants

Twelve participants were recruited to participate in the study, ten males and two female. Each participant attended three 45–60 min data collection sessions within the space of a week. All participants were between 24 and 28 years old and were screened for experience with video games or other interaction with 3D virtual environments (e.g. CAD software).

3.2 Design

The current study used a within subjects design, exposing all participants to three different conditions. The ordering of these conditions was counterbalanced between participants to counter any learning effects. Each condition contained a different number of visual cues to depth. The conditions were labeled low, medium, and high.

The low condition contained only sparse visual cues, including target size, target position within the camera’s field of view, and shading. The target object had a solid color with no texture applied. This condition was created to loosely mimic operating a sensor platform underwater, where there is no consistent horizon, indirect lighting and few useful cues to the scale of objects. The medium condition added a ground plane to the environment, providing a landmark with which to judge the height of the target. The ground plane included a texture of random noise. The visual cues in the medium condition resembled a deconstructed environment like a collapsed building, where there is no direct lighting and few useful landmarks. The high condition added a texture to the target and direct lighting, which resulted in a shadow being cast by the target and the mechanical arm. This high condition most closely mimicked operating a robotic platform outdoors in direct sunlight (see Fig. 2).

Fig. 2.
figure 2

A) Three different viewing conditions used in the study based on actual experiences in rescue robotics. B) The robot-sensor in the simulated environment for the reachability task.

In order to remove any confounding effects on distance perception, many aspects of the virtual environment were randomly varied from trial to trial. These aspects included target height above the ground plane, target diameter, target color, orientation of the direct light source, ground plane texture scale and orientation, and the configuration of the mechanical arm. The mechanical arm required additional constraints because it provided an important landmark from which to judge reachability. In order to eliminate the arm as a confound it was important not only to keep it in view for each trial, but to also have it fill roughly the same percentage of the sensor feed. To achieve this consistency 20 arm configurations were predefined prior to the study, which were then selected at random at the beginning of each trial.

In all conditions the distance of the target from the base of the robotic platform was systematically varied using a weighted and transformed staircase procedure [16, 17]. This adaptive method changes the stimulus strength based on the participant’s response from the preceding trial. Over numerous trials, the algorithm converges on a threshold along the participant’s psychometric function, for example the point at which the participant responds that the target object is reachable 70 % of the time. This point is determined by the algorithm’s up/down rule and the weighted distance the target is moved. The up/down rule determines how many consecutive responses must be the same before the target distance or stimulus strength is changed. Each time a participant’s answer changes from reachable to unreachable, or unreachable to reachable, the target distance is recorded as a reversal point. The algorithm terminates once it records 18 reversals [17]. During analysis, these reversal points are averaged together to estimate the predetermined threshold point.

While this method has proven accurate at estimating the position of a psychometric function [16] it does not estimate the slope - a measure of a participant’s precision. In order to compare the precision between conditions without a slope value, the current study estimated two thresholds on each participant’s psychometric function and took the difference. To accomplish this, staircase algorithms were run in interleaved pairs, each pair consisting of one algorithm using a 3:1 up/down rule and another using a 1:3 rule. The use of inverse up/down rules resulted in two points symmetrical around the function’s 50 % point, the perceived boundary between what is reachable and unreachable.

3.3 Apparatus

The current study was conducted using a MacBook Pro laptop as a computer workstation for the participant. The screen measured 19 inches on the diagonal, and used a resolution of 1920 × 1080. Participants gave input using a standard three-button mouse. Each participant was run in a area isolating them from outside stimuli.

The testing software was built using a 3D videogame authoring engine, Unity 3D. This engine handled the rendering of the 3D environment and the recording of test data. The software was engineered to maintain a rate of 60 frames per second or higher.

Virtual Robotic Platform.

The virtual robotic platform used in the study simulates the Talon built by QinetiQ North America. The platform featured a mechanical arm with four controllable servomotors. The camera was positioned on a stalk above, behind, and slightly to the left of the mechanical arm. The orientation of the camera was such that the chassis of the robot was visible at the bottom of the feed.

3.4 Procedure

Each participant was involved in three data collection sessions. Each session covered one experimental condition, and took approximately 45 min to an hour. Each session took place on a separate day, with all sessions occurring within a seven-day period.

Each session consisted of three phases: familiarization, training, and data collection. The familiarization phase helped the participant become familiar with the capabilities of the mechanical arm by asking participants to use the arm to ‘touch’ a target object with the mechanical arm’s end effector. The familiarization phase provided scaffolding for the participants by structuring which servos could be controlled from trial to trial. The first trial restricted control to a single servo, teaching the participant how to use the interface. The familiarization scaffolding continues during the next three trials where during each of these trials an additional servo. At the end of these four trials all four servos are unlocked. All subsequent trials required the participant to manipulate all four servos in order to touch the target.

The training phase allowed participants to practice the data collection reachability task. During this training phase participants received feedback about the accuracy of their response, correct or incorrect. This phase asked the participant to judge whether or not a target was reachable at different distances. The participant was first presented with a mask screen counting down from 3 to 1. At 1 the mask screen was replaced with the camera feed, along with two buttons labeled “reachable” and “unreachable” at the bottom of the screen. The participant had 3 s [5] to select an answer before the feed was replaced with a black screen. At this point participants had unlimited time to select and answer. Upon selecting an answer a red or green box, labeled with the words “incorrect” or “correct” respectively, appeared, informing the participant of whether they were correct. After 3 s, the mask for the next trial appeared and started to count down.

The procedure of the data collection phase was identical to the training phase, but without the feedback about the participant’s correctness. The data collection in each session was divided into three sections, each divided by rest periods up to 2 min. Each session presented target distances based a pair of staircase algorithms. Each trial alternated which algorithm was used so as to obscure the pattern of sampling.

After completing each session the participants were debriefed and asked if they had any questions. Additionally, they were asked not to discuss their experience with anyone until the completion of the study.

Fig. 3.
figure 3

Individual performance data. The left panels show the difference between each participant’s two thresholds in each condition. The right panels show each participant’s 50 % point, their perceived reachability boundary, and highlight its displacement from the true reachability boundary. Additionally the right panels show each participant’s inner and outer threshold.

4 Analysis and Results

Following similar approaches to assessing affordances, data analysis in the current study involved fitting a sigmoid function to the binomial data using a logistic regression. The sigmoid function measures the human-robot-sensor systems ability to perceive the reachability of a target object. This sigmoid function is the psychometric function of the system. The psychometric function for the current study is defined by position and slope, estimating the participant’s accuracy and precision respectively. The analysis first compared each participant’s performance across the three conditions. Then, the analysis was performed across all participants and conditions to examine trends.

The first analysis performed a repeated measures ANOVA on the estimated 50 % position. The ANOVA was performed on a logistic regression with a within subjects variable of condition (low, medium, and high). The results are shown in panel A of Table 1. Nine of the 12 participants showed a significant different for either high, low, or all three, while one participant showed a significant difference for medium. The high or low only significantly different conditions might indicate that sources of information for perceiving depth are not sufficiently different across the different conditions.

Table 1. The table on the left (panel A) shows the estimated 50 % thresholds by condition for each participant. The table on the right (panel B) shows the estimated slope values by condition for each participant. The significantly different conditions at the p < 0.05 level are noted by an asterisk (*).

A second repeated measures ANOVA was performed on the slope estimates, which are shown in panel B of Table 1. Similar to the previous analysis, the ANOVA was performed on a logistic regression with condition (low, medium, and high) as the within subjects independent variable. Matching the threshold data, both participants 7 and 8 show no difference in their slope estimates. Participant 9 slope estimates are significant but the order does not match the condition with medium being their poorest performing condition followed by low and then high. The remaining 9 participants results matched the predicted ordering, but without significance in some cases. This is potentially due to the visual cues not being different enough across conditions or a possible saturation of the psychophysical performance of the human-sensor-robot system.

The second phase of the analysis examined the threshold positions and the slope estimates for each condition but across all participants. The thresholds were examined individually to more closely examine the effect across all conditions. A repeated measures ANOVA was used to test the within-subjects factors of threshold (inner vs. outer) and condition (low, medium, and high). The analysis revealed a main effect of boundary, F(1,11) = 52.2 p < .0001, with an effect size of ηp2= 0.32 (generalized eta-squared), but not of condition. After correcting the degrees of freedom using a Greenhouse-Geisser estimate of sphericity (e = 0.54), an interaction effect between condition and threshold emerged, F(2,22) = 7.978, p < .01, ηp2 = .03. Exploring this further, Tukey post-hoc 95 % confidence intervals show a difference between all three outer thresholds. This can be interpreted as a main effect for outer threshold. Tukey 95 % confidence intervals did not reveal a difference between the inner thresholds.

The slope estimates were compared next and were calculated by the difference between in the inner and outer threshold points for each condition across all participants. A repeated measures ANOVA was performed on the slope estimates. Similar to the previous analyses, the ANOVA was performed on a logistic regression with condition (low, medium, and high) as the within subjects independent variable. The results for the slope estimates for the three conditions are low (mean = 0.84, SD = 0.45, FLSD = 0.21), medium (mean = 0.63, SD = 0.41, FLSD = 0.21), and high (mean = 0.44, SD = 0.18, FLSD = 0.21). The repeated measures ANOVA showed a significant difference between the low and high conditions. The medium condition was not significantly different from the low or high conditions. Similar to the results form the per participant analyses, the lack of difference between the medium condition and the high or low conditions is potentially due to scaling, where the selected cues to embed in each condition did not provide a large enough effect.

Fig. 4.
figure 4

Inner and outer thresholds averaged over all participants plotted by condition. Black horizontal lines represent the 50 % point for each condition. Note the sharp drop in the outer threshold, this suggests that participants became more precise as more visual cues were added.

Fig. 5.
figure 5

Distance between thresholds plotted with 95 % confidence intervals by condition. Distance normalized to length of the mechanical arm.

These results demonstrate that the procedure can discriminate performance differences across different human-robot-sensor systems. These differences are based on the ability to estimate psychometric functions for a human-robot-sensor system across environments with different visual cues to depth information. More specifically, the results show that the ability of a human-robot-sensor system to perceive the reachability of a target object degrades as the visual cues to depth in the environment becomes sparser (Figs. 4 and 5).

5 Discussion

This work provides a new metric for assessing quantitatively one performance dimension of human-sensor-robot systems — the ability to perceive affordances. The metric is derived from formal and well-established methods in psychophysics transformed for human-sensor-robot systems. The study shows that the perception of affordances, in this case reachability, can be measured and that performance at perceiving affordances varies with task conditions. In this study, the metric was sensitive to changing visual cues about the structure of the environment – cues that explain how perception of the environment is more difficult and less accurate under restricted visual conditions similar to those encountered in rescue robotics. As a result, the study provides a new quantitative, precise measurement tool for use in assessing the perceptual side of human-robot interaction. Moving forward, this measurement tool can be used to compare different human-robot-sensor systems configurations.

The method presented is a complete measurement tool, however, the method can be expanded in several ways to increase the fidelity to real-world operations. For instance, the simulated robot environment can be expanded to include temporal properties of the target object. In addition, the fidelity of the simulated robot could be expanded. Depending on the objective of the measurement this could include both the dynamics and kinematics of the robot, but could also include a higher fidelity interface to more closely match the operator-robot interaction. The method could also be expanded to include other types of affordances, providing a more holistic assessment.

Future studies can also expand the method to accommodate different psychometric procedures. The psychometric literature contains many methods to estimate participants’ psychometric function including a large family of adaptive procedures, and the method of constant stimuli. Comparisons of alternative methods to estimate psychometric functions will allow identification of which method maximizes the ability to discriminate performance reliably. Additional studies are underway that use the current method and metric to compare different sensor-robotic platform designs under different visual conditions.