Keywords

1 Introduction

Recent unmanned aerial vehicle (UAV) cooperative control technology efforts have focused on developing and demonstrating multi-vehicle assignment, scheduling and route planning for reconnaissance, surveillance and target acquisition (RSTA) tasks [3, 4]. These problem-solving capabilities can produce optimal or near-optimal solutions to minimize mission time or total distance travelled by a team of UAVs as they visit designated points of interest (POIs) to inspect potential targets. These capabilities have been shown to reduce operator workload and the time required to generate solutions compared to performing these tasks manually. The human-automation interaction, however, is fairly rigid, limiting flexibility in mission planning and execution. In an attempt to make the mission planning and management more agile and robust, efforts are underway to develop both adaptable and adaptive automation and interface technology. The adaptable automation provides more options for the human, enabling the operator to make major or subtle adjustments to the plan and associated calculations. The adaptive automation goes a step further with the machine having the ability to initiate changes to the plan based on the situation.

In pursuit of adaptive automation methods, the objective of this research was to develop a reliable Integrated Human Behavioral Model and Stochastic Controller (IHBMSC) [2] that could be coupled to the vehicle assignment, scheduling and path planning automation. The goal was to demonstrate a proof of concept whereby the stochastic control automation would perform both a priori and real-time mission planning based on: (1) a dynamic and uncertain mission situation (modeled as a stochastic environment), (2) vehicle resources, and (3) an individual’s capacity to perform the RSTA task effectively. The effort developed a human operator model that represents an individual’s performance tendencies and biases across a range of target acquisition and workload situations (e.g., arrival rate at POIs, dwell time over POIs, etc.). The idiographic-based model was integrated with the stochastic controller (SC) and cooperative control algorithm, leading to a more “closed-loop” form of problem-solving that not only accounts for the vehicles’ capabilities but also the individual operator’s behavior and performance in the uncertain environment.

The research approach leverages and extends current developments in stochastic control optimization [5, 6] by introducing a representation of an individual’s behavior to the stochastic controller. The goal of the embedded stochastic controller is to plan vehicle assignments, schedules and routing to reduce the probability of false alarms without adversely affecting the target detections using a priori and real-time situation and performance data. The initial stochastic controller design for sequential inspection of objects explored the decision of whether or not a UAV should revisit the point of interest to maximize the total expected information gain given there are multiple points to visit, limited resources, a set of corresponding conditions (position, time, and workload) to factor in, and the operator’s target assessment [5]. The stochastic controller’s revisit decision criteria were optimized based on a Bayes rule information theoretic metric. For this research, the logic behind the SC was the following: On the first inspection (i.e., “first look”) of each POI, the operator makes a call as to whether or not the object is a target. For the first look, the stochastic mission environment may or may not have resulted in a poor view of the POI for the operator. The SC is designed to decide if and how the POI should be revisited. If the SC decides to revisit, it uses the operator model to determine an “optimal” revisit path that will result in a good view of the POI. Note that the SC may also decide not to revisit the object. To determine if the POI should or should not be revisited, the SC takes into account all of the following: the state for the first look, the operator’s response, the number of revisits left (e.g., fuel remaining) and the likelihood of the object being a target (or non-target), which is modeled by the a priori probability of target in this mission environment (also known as target density).

In pilot testing that helped identify the initial set of stochastic control parameters, and in another experiment with related multi-UAV tasks involving visual inspection of targets [1], a wide variance in performance was observed across operators. Given this, a human behavioral model that is based on the group’s average performance may not represent a significant portion of the users very well, resulting in mismatched plans and UAV actions for a number of end-users. The research reported here takes a different approach by representing the individual’s behavior and performance in a computational human behavioral model (also referred to as the “operator error model” in this paper) which, in essence, will customize the problem-solving and resulting mission plans for the individual operator at the control station. During the mission, the controller can be used to monitor the actual states (e.g., ground-sampled distance, aspect angle, and dwell time) and compare it to the expected states, computing the probability of detecting a given target. When disparities exist between what was reported versus expected, or where conditions are beyond prescribed levels, the system potentially can adapt mission plans and actions, manipulating such things as the UAVs’ routes, automated sensor steering action points, and the employment of video inspection tools (e.g., digital video recording methods, mosaics).

2 General Method

2.1 Task

The RSTA sequential inspection task required the test subject to view an object in a video and classify the object as a target or a non-target. Two seconds before the object entered the field of view, an alert appeared on the interface to signal the test subject. The object was a white van with a black letter on the side, (Y vans were targets and V vans were non-targets). The test subject viewed the object and then responded either Target or Non-Target by pressing a button on the interface (Fig. 1).

Fig. 1.
figure 1

Vehicle with Target(Y) and Non-Target(V) features

Confidence ratings were collected after each Target/Non-Target response. The confidence ratings were defined as: 1 = no confidence; 2 = low confidence; 3 = moderate confidence; 4 = high confidence; 5 = very high confidence. The test subject verbally reported his/her confidence rating to the test administrator, who recorded it. Confidence ratings were collected for a number of reasons: (1) to investigate the relationship between operator confidence and performance, (2) to enable some bias analyses with signal detection theory (not presented in this paper) that were not possible with only Target/Non-Target responses, and (3) to assess their potential to further improve system performance if they could be incorporated into the Stochastic Controller (SC).

In the task, the test subject was instructed to “do the best you can” (i.e., maximize accuracy) and no performance feedback was provided during training or testing. The test subject was instructed to sit a fixed distance from the monitor. No chinrest was used, but the distance was indicated with a marker and monitored by the test administrator. There were 20 POIs per run and each POI had one object to inspect. Half of the objects were Targets and half were Non-Targets. In addition, 12 POIs were revisited in each run because the SC was incorporated into the system during all testing. The initial views of POIs combined with the revisits resulted in 32 observations per run. Each run lasted approximately 13 min. Between runs, the test subject was given a short break that also allowed the simulation software to be reset with a different scenario. There were four scenarios used in the study. Each scenario had the same arrangement of POIs with a random arrangement of target objects. The test subject experienced scenarios in sequence to maximize the time between repeating a scenario. Test sessions never entailed more than 5 runs at a time, and typically one test session was conducted per day. On the few days that had two test sessions, one was held in the morning and one in the afternoon to allow a rest period of more than an hour.

2.2 Test Design

The study had four variables that defined the condition (i.e., “state”) under which the Unmanned Aerial Vehicle (UAV) flew by an object. These variables were: Ground Sample Distance (GSD), Aspect Angle (AA), Dwell Time (DT), and Inter-Event Time (IET). GSD is a measurement of the resolution of a digital image referring to how much distance of ground is covered in one pixel. A smaller ground sample distance corresponds to a higher resolution image. GSD was manipulated by fixing the camera zoom of the UAV and changing the altitude of its flight path. GSD, in combination with the constants of display size and the test subject’s fixed distance from the display, effectively manipulated the visual angle of the letters (i.e., their size on the retina). AA is the angle in two-dimensional space between the vertical side of the van displaying the task letters and the UAV’s flight path. DT is a measurement of how long the task letter was presented within the simulated UAV sensor (i.e., the observation interval). DT arose from the time it took the letter to move across the sensor’s field-of-view, which was a fixed distance on the display monitor. Therefore, DT co-varied with the letter’s velocity on the display monitor. IET is a measurement of how much time elapsed from the end of one observation to the start of the next observation (i.e., inter-stimulus interval). IET was used as a way of affecting operator workload, analogous to the event rate in vigilance tasks.

A planner was used to generate a UAV trajectory to visit all the objects. These variables were randomly sampled from uniform distributions to simulate the stochastic nature of realized flight path (Table 1). This random sampling procedure is different from most psychological methods (e.g., method of constant stimuli), but captured important variability that is present in the applied setting. For revisits, the premise was that once the UAV flew over the object and the object was in the image frame, then the true state could be determined. For example, if the size of the object was known a priori, then the UAV height above terrain could be computed as well as an altitude bias at that location. Also, knowing the ground speed, airspeed, and heading, the local wind vector could be estimated. Given the local altitude bias, the actual object pose angle, and the local wind vector, then a revisit state command could be computed that yielded the realized optimal revisit state computed by the stochastic controller.

Table 1. Ranges of independent variables

The simulation approximated the previously described procedure and uncertainty. For the initial object visit state, or first look, the planner algorithm sampled the altitude, aspect angle, and speed (dwell) from uniform distributions to yield the commanded and realized state. For the revisit, or “second look”, the optimum state commanded from the stochastic controller usually was achieved by the simulation, however a small amount of variability was present for a subset of second looks.

2.3 Equipment

The apparatus was the Vigilant Spirit Control Station (VSCS), developed by 711HPW/RHCI. It is a research test-bed used to develop, integrate and test advanced technology for single-operator, multi-UAV RSTA tasks and missions [3]. VSCS served as a single interface for performing both piloting and sensor operations and test subjects interacted with it through a mouse and keyboard. For these tests, we used a customized version of the VSCS that consisted of two windows (Fig. 2). Each window was displayed on its own 24” monitor at a screen resolution of 1920 × 1200 (Dell 2408WFPb; Round Rock, TX). On the left window was the Tactical Situation Display (TSD) which showed the real time position and flight plan of the UAVs. On the right window were the individual video feeds from each UAV.

Fig. 2.
figure 2

Tactical situation display and 4 UAV video display

The TSD is shown on the left side of Fig. 2. The test subject could see the 4 UAVs and their point search tasks at a compressed scale. The video display is shown on the right side of Fig. 2. It shows the simulated feeds streamed from each of the UAVs. The video images were generated using MetaVR©. There was a 3-dimensional terrain database and cultural objects (i.e., models) defined for the simulated video. The individual UAV dynamics and sensor motion were simulated on the VSCS. UAV positions and states were continuously streamed to the graphic computers to define a viewing state into the 3-dimensional database. The view at this state was then sent to the VSCS and displayed on the individual video panes. The Target/Non-Target buttons were located to the right of the corresponding video. The alert appeared behind the buttons as red fill and is illustrated in two of the video panes.

3 Proof of Concept Demonstration

The demonstration conducted was an assessment of the Stochastic Controller (SC) as well as the creation of an operator model. For the assessment, we expected that the operator and SC combined (performance with revisits) would perform better than the operator alone (performance without revisits). In order to conduct this assessment, we needed to conduct human-in-the-loop test using an operator error model of that individual in the SC. We had to create that model prior to conducting our test since no such model existed. The operator error model in the SC had to capture how an operator’s performance varies as a function of four variables of interest: Ground Sample Distance (GSD), Aspect Angle (AA), Dwell Time (DT), and Inter-Event Time (IET). We expected that operator error would be higher with higher values of GSD than with lower values and operator error would be higher with lower values of AA, DT, and IET than with lower values. No predictions were made concerning interactions between variables, however, we expected that model development could be made more efficient if no interactions were found between variables because future operator models could be developed by modeling the impact of each variable independently without needing to factorially combine variables.

3.1 Participant

One member of the research team volunteered to participate as a test subject (S1). S1 had normal vision and was 21 years old. She was experienced with psychological research tasks and was informed of the purpose of the study.

3.2 Task

The test subject was introduced to the interface and the task prior to beginning data collection. The test subject was told that there were an equal number of targets and non-targets in each run. For the model creation effort, the subject performed 120 runs that each had 10 Target and 10 Non-Target trials (2400 trials overall). This number of trials would provide approximately 75 samples of each of the 16 modeled states (median-split combinations of all four variables) for target and non-target trials. Some technical issues with the data logging software resulted in losing a small number of trials (<1 %), leaving 2,395 trials for creation of the operator error model.

Important to note, model creation runs also included the SC in the system loop, meaning that each run had 12 s look trials in addition to the 20 first looks. For the SC to function within the system loop during initial model creation runs, the SC used an operator model constructed from a preliminary dataset. This model was replaced by an interim model for S1 after 25 runs and updated after 75 runs. The preliminary model and the interim models were relatively similar and changing them did not appear to impact performance during testing. Once the complete 120-run operator model was created, an additional 10 runs were conducted using that model to assess the performance impact of the SC.

3.3 Results

A total of 120 data runs (2,395 “first look” trials) were used to create the operator model. Analysis of these trials and generation of the operator model was performed using descriptive statistics. Performance for S1 was: false alarm rate (FA) = 35 %, miss detection rate (MD) = 17 %, and percent correct (PC) = 73.8 %. The final operator error model had 16 different states that were based on median-split combinations of the 4 variables (i.e., range of values of each variable has been separated into high/low categories about the median). The stochastic selection of states resulted in an average of 74 samples per state with a standard deviation of 11 samples. Across states, FA rates ranged from 2.8 % to 69.9 % and MD rates ranged from 3.3 % to 36.0 %.

The combination of both levels of all four variables is shown in Fig. 3. Parallel lines in each panel would suggest an absence of interactions between variables. The converging lines suggest there may be interaction present, though this was not statistically tested.

Fig. 3.
figure 3

Operator percent correct performance for the 16 model bins

To have better insight into the effects of the variables, we approximated psychometric functions using the first look data. Psychometric functions show percent correct across a range of variable values. Psychometric functions were approximated by repeatedly splitting each high/low group about its median (Fig. 4). This figure shows orderly effects for GSD, AA, and DT, though the range of variable values was restricted. The effect of IET was less orderly than the other variables. A linear fit to IET yielded a slope of almost 0 and an intercept at the average PC for all first look trials.

Fig. 4.
figure 4

Approximated psychometric functions for each variable

The results from the 10 assessment runs that used the final operator error model are shown in Table 2. The SC reduced both the False Alarm (FA) and the Missed Detection (MD) rates, resulting in a large increase in percent correct.

Table 2. Results from final operator error model

The confidence ratings for first look trials show that the most common confidence response was 2, followed by 1, which together accounted for 84.5 % of responses. Confidence ratings of 3, 4, or 5 accounted for the remaining 15.5 % of responses. Figure 5 shows average performance as a function of confidence rating. Confidence was strongly related to performance. For confidence of 3, 4, and 5, performance was very high. Performance was low when confidence ratings were 1 or 2. Simple correlations between confidence responses and variable values showed that: GSD had a strong negative relationship to confidence (r = −0.61), AA and DT had medium positive relationships to confidence (r = 0.21 and 0.20, respectively), and IET had no relationship to confidence (r = 0.01).

Fig. 5.
figure 5

Relationship between confidence ratings and performance

3.4 Discussion

Through this study we were able to create a model of an individual operator’s performance during a target detection task. Then we were able to integrate this model into UAV control automation for a proof of concept demonstration. For the ranges sampled, the variables of GSD, AA, and DT affected the operator in the expected direction (as indicated by the direction of their slope). Low values of GSD increased performance compared to high values and high values of AA, DT, increased performance compared to low values. GSD had the most drastic effect on operator performance. The variable IET had little to no effect on performance. One specific IET that we were interested in was the occurrence of overlapping observations–instances in which the operator would have to split their viewing time between multiple video feeds. We expected that overlap instances would impact performance because they are related to how much time the operator has to view the POI. There were small sample sizes of 3 or 4 simultaneous letters onscreen (41 trials, or 0.34 %) because overlap instances and IETs in general were not designed into the stimulus conditions, as previously mentioned. To increase the sample size, all overlap trials, including instances of 2 simultaneous letters onscreen, were grouped together (559 trials, or 4.66 %). To further increase the sample size, we calculated a “generous” measurement of overlap, which began at the pre-observation cue that alerted operators to the upcoming inspection point and ended at the time of response (845 trials, or 7.05 %). For all the ways we identified overlap trials, the difference in percent correct between overlap trials and all other trials was less than 1 %. One difficulty in assessing the effect of overlap in this data set is that overlaps contained both first looks and second looks. Second looks were consistently very high in accuracy, and so reduced the variability in performance.

The results showed that the stochastic controller was able to aid the operator in the target detection task. The SC reduced the FA rate by nearly a factor of 8, reduced the MD rate by a third, and increased the PC by nearly 20 %.

4 Robustness Test

As was noted in the introduction, previous work has shown individual differences in target acquisition which led us to investigate an idiographic-based modeling approach. To assess the robustness of applying the idiographic-based model to others, we tested the performance of operators with the Stochastic Controller (SC) when the SC contained a model of a different person. If human behavior under these viewing conditions is sufficiently similar, then overall performance with the SC should still benefit from the ‘mismatched’ model compared to unaided performance.

4.1 Participants

Two members of the research team volunteered to participate as test subjects (S2 and S3). Both subjects were male with normal or corrected to normal vision and ages ranged from 24 to 28 years (mean = 26). Both subjects had previous experience in psychological research and had full knowledge of the research goals and task.

4.2 Task

This test used the same test apparatus and task from the proof of concept demonstration. In addition, the SC used the operator model from S1 for both subjects. Both test subjects were trained in the task, one subject trained for 2 runs and the other trained for 5 runs, with sufficient performance for participation set to PC > 70 % correct. A total of 17 runs per subject were conducted, resulting in 340 trials per subject for analysis.

4.3 Results

The results from these two subjects in comparison to subject S1’s results from the SC assessment from Test 1 are depicted on Table 3. With the SC, FA and MD decreased for both subjects, causing an overall PC increase of at least 14 %. Confidence scores also increased as performance increased, averaging about 1.5 points higher with the SC. Mean confidence values for the operator with the SC were calculated by replacing the first look confidence rating with the second look confidence rating for revisited targets.

Table 3. Results of robustness test

4.4 Discussion

Data from the second study showed a large degree of similarity across test subjects. Unaided performance for the different subjects was similar in the task, as well as the direction and magnitude of the benefit of the SC. This data suggests that an operator error model for AA, GSD, DT, and Inter-Event Time (IET) may be robust and generalizable to different subjects that are similarly trained and selected.

5 General Discussion

This work tested the concept of representing human performance to control automation via an operator model for the purpose of increasing target detection performance. The demonstration created an operator performance model for values of GSD, AA, DT, and IET. The model was created using descriptive statistics and had 16 states that arose from high/low combinations of the four variables. This model was incorporated into the control automation, called the Stochastic Controller (SC), and validated. The operator aided by the SC outperformed the unaided operator in false alarms and missed detections. A second test was conducted to determine if the benefits of the SC and operator model would be robust to different operators that were not used to create the model. This study showed that different operators could still benefit from the SC when the SC has a mismatched model. One limitation of the robustness test is that only 2 subjects were used. Perhaps the model was similar to these subjects’ performance by chance. A future study should increase the number of subjects to better test the robustness of the model and use a more rigorous test procedure. One potentially rigorous procedure would be to create a complete operator error model for additional test subjects and then compare the differences between their models and test the performance with each other’s ‘mismatched’ models. Perhaps for variables of GSD, AA, DT, and IET operators aren’t similar when studied in this more thorough fashion. Moreover, other variables of interest mentioned above may show additional differences between operators (e.g., contrast sensitivity is known to vary between people and result in differences in performance on target ID tasks).

This research used UAV control automation to determine whether or not to revisit POIs. The revisit decisions from the SC are optimum from an information theory perspective. Future research is planned to compare the SC revisit strategy and resulting performance to a human’s revisit strategy (full manual revisit decision) and a human receiving revisit suggestions from the automation (management by consent). With the data from the previous two studies, subjects’ confidence ratings were highly correlated to their target detection performance. We expect that humans would choose to revisit when confidence was low and therefore there would be little to no difference between human revisit decisions and the theoretically optimum decisions of the SC. Moreover, fully automated revisit decisions are likely to face acceptance challenges, especially when the revisit decisions contradict the operator’s intent.