Keywords

1 Introduction

The future Unmanned Aerial Vehicle (UAV) operator will no longer directly control a single vehicle or its payload, but will instead manage a group of highly autonomous UAVs. Within this new supervisory role, the operator’s primary responsibility will be one of determining how to allocate the UAVs to meet multiple mission requirements within a dynamic and uncertain environment. Implementing this new paradigm of UAV management requires an increase in autonomy as well as an understanding of how a human performs in this supervisory capacity. Since multi-objective decision making is a primary aspect of future UAV operations, conducting research within a realistic and complex testing environment which enables investigation of various decision support concepts and automation tools is critical to the successful implementation of supervisory control.

The first section of this paper will discuss the development of the Supervisory Control Operations User Testbed (SCOUT™), an experimental platform for investigating the human performance challenges associated with supervisory control of multiple UAVs. The second section will highlight results from initial research conducted within SCOUT, addressing workload and situation awareness measurement techniques. The third section will discuss the development of a decision performance assessment algorithm as well as a new proposed approach for quantifying the difficulty of multi-objective asset allocation decisions within SCOUT. In addition, section three will present a method for providing an operator with customized decision support based upon the amount of risk he would like to take. The paper will conclude with a summary of future research that can be conducted within SCOUT, leveraging these new algorithms to enable mixed initiative decision support.

The Supervisory Control Operations User Testbed.

The U.S. Naval Research Lab developed the Supervisory Control Operations User Testbed (SCOUT) to investigate future challenges that human operators will experience while managing missions involving multiple autonomous systems. SCOUT contains representative tasks that a future UAV supervisory controller will likely perform, assuming advancements in automation. The testing environment is also instrumented with physiological sensors, which are integrated with the SCOUT and user performance data, in order to gather a more complete understanding of a user’s state, such as the operator’s mental workload, or level of awareness (see Fig. 1). These constructs are inferred via proxy measures such as: an individual’s pupil size, which is correlated with mental workload (e.g., [1]); eye gaze patterns, which can serve as indicators of situation awareness and attention allocation (e.g., [2]); and heart rate variability, which is inversely correlated with stress levels (e.g., [3]).

Fig. 1.
figure 1

SCOUT operator interacting with the testbed while eye tracking data is being collected

SCOUT operators engage in Intelligence Surveillance and Reconnaissance (ISR) missions in which they are responsible for mission planning as well as airspace management, updating UAV flight parameters, responding to communications and re-planning as new mission information is received. The operator’s primary responsibility is to make decisions on how to dynamically allocate unmanned assets as is appropriate for achieving mission goals. Specifically, operators must decide where to send each of their three UAVs to search for targets of varying priority levels, uncertainty (with respect to intelligence information about location, which impacts time and ability to actually find targets), and deadlines. In addition, operators must respond to requests for information and commands via chat boxes, as well as monitor airspace, fuel levels and sensor feeds. In order to motivate users, SCOUT operators receive points for finding targets and for providing timely and accurate information to mission command, and they lose points for violating restricted airspace. All tasking is driven by pre-scripted scenario files, and event timing within the scenario can be used to increase or decrease the workload requirements of the operator.

2 Previous Research Within SCOUT

Human subject data collection was conducted within SCOUT with twenty volunteers. Each individual wore a Zephyr Bioharness, which recorded heart rate data, in addition to being calibrated on the SmartEye Pro system, which recorded eye gaze, pupil size and eyelid data. Prior to engaging with SCOUT, participants received approximately thirty minutes of training which consisted of viewing short videos, followed by interactive assessments to ensure a high level of comprehension. Following training, participants engaged in two SCOUT sessions. Each session was comprised of a planning phase, followed by three mission execution blocks. Participants were given up to ten minutes for the planning phase and when they were ready to continue, they began the mission.

Each mission execution session took approximately 18 min, which was comprised of three six minute blocks of varying difficulty (i.e. task load), but always started with a medium level of difficulty. Block difficulty was manipulated by the frequency of tasking (e.g., new target opportunities and chat requests/commands) and each session was counter-balanced such that half the participants received Session A first, while the other half received Session B first, as seen in Table 1. During the easy, medium and hard blocks, tasking was presented at a rate of every 75, 45 and 15 s, respectively. An example task may include chat requests to change the altitude on a UAV (“Increase altitude of UV-72 by 73”) or relay information on a UAV (“Provide speed of UH-28”). Task frequency was systematically varied in order to observe the impact on human performance in addition to eye tracking and heart rate metrics.

Table 1. Sessions and Difficulty Blocks

Participants also experienced Situation Awareness (SA) freeze probes once per block. During these probes, everything on the SCOUT interface disappeared except for the map display, and the participant was instructed to recreate the position of the three UAVs and any active targets which were last present on the map. Additionally, participants were told to provide the current target each UAV was pursuing and how many points that target was worth. Lastly, they were asked whether the UAV would be able to complete its search within the next ten minutes. See Fig. 2 for a depiction of the SA probe.

Fig. 2.
figure 2

Example SCOUT Situation Awareness freeze probe

The mission task performance measure of interest in this experiment (responses to tasking via chat) revealed increases in error and decreases in the percentage of tasks completed associated with increasing levels of task load [4, 5]. Following the same pattern, results on the SA probe showed increases in the error distance for map object placement (i.e., larger distances between where participants placed targets and UAVs and where they actually were) as a function of increases in block difficulty level. This same result was found to hold for the UAV-target pair accuracy on the SA probe, with decreases in performance on the hard condition compared to the easy one [6].

Results from this data collection also demonstrated how eye tracking and heart rate metrics align with performance metrics and can be used to infer a user’s current cognitive state. Specifically, increases in participant’s task load were associated with increases in the mean and maximum of participant’s pupil size, derived over a six-minute block of data [4]. Additionally, pupil size standard deviation was statistically significant in differentiating the task load levels, and when a subset of the four top and four bottom performers were analyzed, the standard deviation of the bottom performers was much greater than the standard deviation of the top performers (see [4] for more details) for each task load level. This suggests that greater variability in pupil size values may reveal that a user is struggling with a task, while smaller fluctuations suggest greater task comprehension or mastery. This is consistent with research by Ahern and Beatty [7], which demonstrated that students with lower scores on the scholastic aptitude test (SAT) exhibited greater pupillary dilation than those with higher scores, when given math problems.

Additional analysis conducted on the eye gaze data revealed that participants had significantly fewer fixations within the relevant display region, the map, one and two minutes prior to the SA probe during the high level of task load, compared to the low task load [8]. In addition, the fixation duration was significantly shorter during the high load level. Participants also exhibited smaller spread of fixations, or dispersion, one minute prior to the SA probe during the high task load level, compared to the low and medium levels [6]. These findings are significant since they correspond to SA performance data, demonstrating how fixation metrics calculated within the appropriate areas of interest can be predictive of SA. As such, eye tracking can provide a dynamic measure of SA without subjecting an individual to invasive freeze probes, which potentially add confounding effects (due to interruptions) and can only provide a measure of SA at that specific point in time.

Electrocardiogram data was also collected during experimentation and heart rate variability (HRV) data was analyzed to verify the inverse correlation with mental workload [3]. Results showed higher levels of HRV in the planning phase than in the first block of each session, indicating higher levels of mental workload during Block 1 than the planning phase, which makes sense given the increase in time pressure and dynamics of the mission execution environment during Block 1. Furthermore, HRV was greater in the second session than the first, indicating higher levels of mental workload in the first planning session compared to the second session, likely attributable to learning effects and familiarity with the task (See [9] for more details). HRV was not able to distinguish the differences in task load, however, during mission execution.

The various metrics discussed above can all be used in combination to help inform a user’s state (i.e., workload, situation awareness) as they engage in a task. The authors have started to utilize machine learning methodologies to predict a SCOUT user’s workload from 60 s chunks of eye tracking and performance data [10]. The objective is to identify when an operator is outside his or her optimal level of workload (e.g. overloaded) in order to know when to provide decision support, relief, or even additional tasking in the case of underload, to an operator and ultimately prevent errors.

3 Future Decision Making Research Within SCOUT

The primary task within SCOUT is the route planning task in which the operator develops a path plan for sending each of her three UAVs to find active targets, i.e., a plan that specifies the sequence of targets to search within their opportunity windows. Each of the UAVs are equipped with a sensor that actively searches for targets on the ground, but UAVs differ in capabilities, such that some are faster and have better sensor ranges than others, which facilitates more rapid location of targets. As this is a search task, the exact latitude and longitude of each target is uncertain and this area of uncertainty is represented on the main map with a white circle surrounding the target (as seen in Figs. 3 and 4). In determining the best plan, the operator must consider each UAV’s current position, velocity and sensor capabilities, and each target’s value and location uncertainty, which is influenced by the target’s search area size and deadline. A target can have a large search area, but long deadline, which enables a full search of the target area, or it could have a short deadline, which means that only a fraction of the target search area can be covered and the target may or may not be found.

Fig. 3.
figure 3

Clustered UAV scenario with optimal route solutions displayed

Fig. 4.
figure 4

Dispersed UAV scenario with optimal route solutions displayed

The operator is tasked with developing an optimal plan that will maximize the cumulative value. Target values are only rewarded if a UAV finds the target, which occurs at a random point during the search, i.e., it could be located any time between 1 % or 99 % of the search. Furthermore, the operator’s plan can include multiple targets for each vehicle, but the order in which the targets are visited influences how much of the search area can be covered before a target’s deadline. This is comparable to developing a route plan for running errands of varying utility at multiple stores which have different operating hours and which will take variable amounts of time within the store to achieve one’s objectives. In this respect, decisions must be made as to which items to prioritize, since not everything can be accomplished.

The scenarios within SCOUT are pre-scripted such that all the participants initially begin with the same decision making problem and receive new target opportunities and intelligence information at the same time during mission execution. However, every route decision an operator makes within SCOUT influences future decisions, that is, it is a dynamic decision making problem over a time horizon. As such, if two operators implement different initial plans, their UAVs will be moving in different directions and will thus be in different positions when new targets or updated target information occurs in the scenario. A specific vehicle’s location could mean the difference between a new target being a better alternative than the existing plan or a poor choice. Initial studies conducted within SCOUT could determine if one operator’s initial plan was superior to another’s, based on expected utility theory, but had no way of assessing the operator’s decision making quality as the scenario progressed since the two cumulative scores by the end of the mission were not comparable.

To address this shortcoming, NRL has been actively collaborating with the University of Connecticut to develop approaches to determine the optimal target sequence within SCOUT at every decision point for all operators [11]. The optimal solution is based upon a target’s expected value. If a target was valued at 1000 points, but only 55 % of the area could be searched before its deadline, it would have an expected value of 550 points; however in SCOUT points are either awarded in full or not at all. Decision making within SCOUT is not a simple static process, but rather a continuous dynamic task, since plans need to be monitored and updated as new information becomes available. The computation of an optimal solution allows for the objective evaluation of the operator’s decisions throughout the scenario by comparing the expected value and the time required to complete the operator’s plan with a plan that maximizes the dynamically evolving expected value.

The route optimization algorithm may also afford a method of quantitatively evaluating the difficulty of different scenarios. The position of vehicles relative to targets makes the optimal plan for some scenarios “easier” than others. For example, within some scenarios, several UAV-target pairings can be eliminated simply based upon proximity and thus reduce the number of alternatives the operator is forced to choose from. Consider the scenarios in Figs. 3 and 4: the scenario depicted in Fig. 3 has the same target locations, point values and deadlines as the targets in Fig. 4, but the starting UAV locations are different: either clustered together or dispersed. Figure 4’s Dispersed UAV scenario illustrates an example planning problem in which the operator can quickly eliminate potential vehicle-target assignments, since it would not make sense for an operator to consider sending a UAV to a target location far away when there are two within its close proximity.

While the optimal route plan for the Dispersed UAV scenario in Fig. 4 yields the maximum reward of 1635 points, the 10th best plan yields only 1267 points and the standard deviation within the top 10 plans is 127 points. This scenario stands in contrast to the Clustered UAV scenario depicted in Fig. 3, in that there are fewer vehicle target-pairings that can be eliminated and the standard deviation within the top 10 plans is much less, at only 57 points (see Table 2 below). In this scenario, since multiple plans yield similar expected values, determining an ideal solution and comparing among various plan options should be qualitatively more difficult for the operator.

Table 2. Optimal plan scores, top 10 plan score averages and top 10 plan score standard deviations for Clustered and Dispered Scenarios

It is not definitive yet whether the variance among the top optimal scores can serve as a proxy for plan difficulty; however, the authors are developing a version of SCOUT in which the operator would only complete the route planning task across a range of different scenarios (planning blocks only, no mission execution). Comparing the amount of deviation of the top optimal plans with the time required to reach a decision should help determine if this type of metric can be applied in assessing the decision difficulty.

Accounting for Risk in Decision Making.

When a SCOUT operator decides to prioritize one target over another, the operator is essentially making a choice that the potential reward and time requirements of one opportunity supersedes the other. These decisions made under uncertainty translate into the amount of risk an individual is willing to take. The optimal route solutions discussed above were computed using an expected utility function which is risk neutral. To explain, when applying expected utility theory, if we have a target worth 500 points and search 50 % of its possible searchable area, then the expected value of this target with this search is 250, assuming targets are uniformly distributed throughout the search areas and that search of the area is swept at a constant rate with respect to time. This is equivalent to:

$$ {\text{E}}_{\text{t}} = {\text{ V}}_{\text{t}} * \, \left( {{\text{TD}}_{\text{t}} } \right)/\left( {{\text{TS}}_{\text{t}} } \right),{\text{ where}} $$
(1)

t = target

E t  = expected value of target t

V t  = point value of target t

TD t  = time allocated to search before target t deadline

TS t  = time required to completely search target t

This equation fails to account for the human decision maker, however, who is not always perfectly rational and may desire to make riskier or more conservative (i.e., risk averse) decisions. For example, a conservative decision maker may feel uncomfortable pursuing a 500-point target, which only has a 50 % chance of being found (assuming 50 % of the target’s area can be searched before the deadline). As such, this individual may mentally decrement the target value (say, to 180 points) and view it as being worth less than the expected value. This individual may, therefore, decide to pursue another target worth 200 points, which has a 100 % chance of being found, over this 500-point target, which is valued by the operator as 180. Conversely, an individual who is willing to make riskier decisions, may view the original uncertain target as being of higher worth than the expected value (say, 420 points) and therefore be more inclined to pursue this target, which only has a 50 % chance of being found, over a target which is valued at 350 points and has a 100 % certainty of being found.

In order to account for different risk thresholds which a SCOUT user is willing to accept, we propose a modified version of the expected utility function, which we call perceived value, which includes a parameterized risk threshold, R:

$$ {\text{PV}}_{\text{t}} = {\text{ V}}_{\text{t}} *{ \hbox{min} }\left( {\left( {{\text{TD}}_{\text{t}} } \right)/\left( {{\text{TS}}_{\text{t}} *{\text{ R}}} \right),{ 1}} \right),{\text{ where}} $$
(2)

PV t  = perceived value of target t

R = risk threshold

Here, we see that when R = 1, the equation yields the same solution as the expected value; however, when R > 1 the perceived value decreases signifying a more conservative risk threshold, and when R < 1, the perceived value increases signifying a more liberal or risky threshold. R values less than 1.0 essentially represent the percentage of a search area that a decision maker would like to be able to search before the target deadline, or the amount of risk the user is willing to accept. R values greater than 1, on the other hand, serve the purpose of decrementing the value of a target which can only be partially searched (< 100 %) before deadline in order to represent a more conservative search approach. Also note the second half of the PV t equation is bounded to a maximum of 1 in order to prevent PVs from becoming greater than the possible target value, since R values less than 1 can lead to TD t /(TS t * R) greater than 1, which is not possible.

To demonstrate how this might support a decision maker, consider the following simple scenario in Table 3 in which a decision maker has the option to pursue only one of three different targets: Alpha, Beta or Charlie, each of which has a different point value and search area percentage which can be accomplished prior to its deadline. The three perceived value columns show the calculated values for different R values, which represent perceptions of a risk neutral, high risk and conservative pursuits. The highest value solution for each target is demarked with an asterisk; note how the ideal target to pursue varies depending on the R value utilized. This disparity demonstrates why it is important to provide users with mixed initiative planning tools, which enable them to impact the recommended decisions by providing customized weightings.

Table 3. Example scenario involving three targets with varying point values and uncertainty and their perceived values as a function of risk threshold

Reviewing Table 3, we see that if a decision maker employed a risk neutral strategy that Alpha is the optimal target to pursue; however, a high risk strategy would suggest Beta is actually a better gamble than Alpha. A conservative strategy, however, weights certainty as more important than the potential for more points, and, therefore, suggests Charlie as the best option. This ability to assess target options based upon different risk thresholds enables SCOUT operators to weigh the risks and benefits of various plans, given the context of the mission and its objectives. This tool is currently being integrated within SCOUT in order to provide the operator with a decision support route planning tool. User involvement is especially important in unmanned system applications since the human operator often has access to additional information that planning algorithms do not consider (e.g., intelligence briefs) which might impact how much risk is appropriate in different circumstances.

In addition to using different perceived values as weights to drive the optimal path plan options within a decision support tool, these perceived weights can also provide an additional tool for determining an operator’s risk threshold based upon their decisions. This can be accomplished by applying different risk weightings to both the operator’s plan and the optimal plan and comparing whether the operator’s weighted plan value exceeds the value of an optimal risk neutral plan. If the operator’s expected point total exceeds the risk neutral point total when a conservative weighting is applied, the operator’s plan would be assumed to be more conservative than the risk neutral optimal plan. Conversely, if a risky weighting on the operator’s plan exceeds the optimal plan’s value this indicates the operator likely had a risky decision criterion. If the risk neutral plan exceeds the operator’s plan with the different weights, this suggests that the operator may have selected a poor plan. While the perceived value formula above will likely be refined, it provides a powerful tool for investigating risk within a supervisory control task. This enables a number of research questions to be addressed such as how high and low workload impact an operator’s decision making.

4 Summary

This paper reviewed previous research conducted within SCOUT, demonstrating the utility in collecting eye tracking and heart rate measurements within a realistic and complex supervisory control simulation. These findings are especially significant within the application domain of unmanned systems where system interaction can be limited and performance measures are difficult to acquire. These physiological metrics, in addition to performance and mission context information, can be utilized to inform predictions about whether a user will be able to successfully accomplish the mission, such that a user who has elevated pupil sizes, decreased HRV and reduced visual dispersion might be flagged as at risk and provided an alert. Furthermore, the cost of eye tracking technology continues to drop such that highly accurate low cost systems are now available and it is viable to instrument military work stations with eye tracking and heart rate sensors in order to apply research such as this to augment the effectiveness of US Warfighters.

The second section of the paper highlighted ongoing work to incorporate new route optimization and decision bias algorithms into SCOUT. One of the more important areas of research for supervisory control of UAVs is within planning and re-planning, which take place throughout the mission and is a multi-objective decision making problem. The incorporation of these new optimization algorithms allow for continuous assessment of both operator decisions and for providing decision support to suggest when new information (e.g., new target or updated intelligence) merits a change to the existing plan. The metrics provided by these new algorithms will enable future research investigating how varying levels of task load impact both decision quality as well as risk biases. Additionally these algorithms will help drive new decision support tools and research within SCOUT.

Ultimately SCOUT was created to represent the key characteristics of multi-objective decision making that is a critical part of the ISR UAV missions today, as well as demonstrate how those decisions will become increasingly complex when operators begin managing multiple vehicles in the future. The researches plan on making SCOUT freely available by the end of 2016 and encourage others to utilize SCOUT as a tool for their own research.