Keywords

1 Introduction

Human-machine teaming is becoming an ever present aspect of executing modern military missions. Human-machine teaming can be seen across a large spectrum of automation capabilities, from traditional automation, to robots, to semi-autonomous systems, and even to autonomous teammates. Designing these systems requires not only great technical skill, but also requires a deep understanding of the interaction between the human operator and the automated system. Design decisions made early in the system development life cycle can have long-term repercussions for the effectiveness and utility of the system, and thus it is vital to have an effective mechanism for predicting and evaluated these interactions and for predicting how automation design decisions can influence the human’s performance and the overall system/team performance. Simulation is one such mechanism. It is relatively cost and time efficient, allows designers to easily explore a large option space, and can be used early in the design process, before the system is built, and before key design decisions are made.

The purpose of this paper is to explore the role that simulation can play in the design of automated systems in order to improve human-automation interactions. This discussion includes current human-performance simulation research being performed at the Air Force Institute of Technology (AFIT) aimed at evaluating team performance and operator mental workload in order to make system design tradeoffs, perform task allocation, design adaptive systems, and design interfaces. In addition to predicting team performance, this line of simulation research is also seeking novel methods to evaluate situation awareness, trust, and fatigue.

2 Performance and Performance Tradeoffs

Performance is an important consideration in system design, which often includes component-level and integrated testing in order to ensure actual system performance is commensurate with the planned/desired level of performance. When evaluating system designs, the human operator is often treated as a reliable, error-free, high-performing external actor. However, in order to truly evaluate the overall system performance, the performance of the user with respect to their interaction with the system must be taken into account. Of course, in achieving desired system and/or human performance, sometimes tradeoffs need to be made with other factors such as system cost or size [13], and user manning or workload [4, 5]. For example, in order to increase overall system performance, user manning may need to be increased. In turn, this might necessitate an increase in system size, resulting in higher system cost.

Automation can sometimes be used to offset human performance [6, 7]. For example, implementing system automation could be used to augment the user’s performance of tasks, thus increasing the overall performance of the human-system team while also mitigating the need for other tradeoffs (e.g. increased manning or task allocation) to achieve the same level of performance.

Simulation can be used to conduct trade studies on performance in order to estimate the efficacy of potential design decisions or potential re-design options. Each design option can be modeled as a separate alternative system, and the results from the simulation runs will reveal the expected performance outcomes as well as unanticipated emergent behavior.

For example, research performed by Watson [8] used simulation to demonstrate the value of considering human performance when making system design tradeoff decisions. The study used a threat detection scenario in which automation was implemented as a way to augment user performance. The Improved Performance Research Integration Tool (IMPRINT) was used to simulate six automation alternatives with varying design parameters in order to predict which automation produced both the largest increase in team performance and decrease in user mental workload. The study showed that by performing these simulated tradeoff analyses, the effect of the human’s performance on overall system performance can be seen, thus enabling more informed design decisions.

3 Task Allocation

In addition to system-level trades, when designing human-machine teams, another important design decision is which tasks should be allocated to the human and which ones should be allocated to the automated system. Automated systems and human operators bring unique qualities and abilities to the work environment. Automated systems do not lose vigilance and can perform certain tasks–such as computations–almost instantly. However, automated systems often lack flexibility since they operate within the bounds of their code. Humans may lose vigilance in monitoring tasks [9], but are flexible decision makers in response to unusual or unforeseen circumstances. Additionally, humans and automation can also have overlapping abilities in which a task could be performed suitably by either [10]. Task allocation decisions should consider: the abilities and limitations of both the human and the system; the other tasks that each is also managing; and the required handoffs that occur due to the division of tasks.

Modeling and simulation provides an opportunity to examine how variations in task allocation and handoff interactions affect human behavior and workload, as well as entire system performance. In dynamic environments, it is too costly to perform human test subject experiments to the extent that is necessary to discover preferred task allocation. Therefore, well-designed models are cost effective measures to adequately predict the effects of task allocation in response to environmental changes.

For example, research conducted by Goodman et al. [11] evaluated task allocation options using an automated route-generation aid in an Air Traffic Control-style task. The objective of the Air Traffic Control task is to direct incoming ships to their corresponding destinations, while avoiding negative outcomes, such as ship collisions. There are two high-level functions: identifying an incoming ship and creating a route. In this research, the automated aid was given responsibility to identify an incoming ship and the human with route creation. SysML activity diagrams [12], in conjunction with the IMPRINT simulation tool [13], were developed to assess task allocation. This research revealed that modeling task allocation requires careful consideration of human-automation communication actions. Explicit depictions of outputs and inputs are necessary to properly model interactions. Another consideration is human behavior adjustments in response to automation’s actions. It should not be assumed the human will consistently perform the same behaviors when working with an automated teammate as he/she would if working solo. It is important to capture dynamic human decision making, such as acting upon the automation’s suggestions, ignoring the automation, or additional human activities specifically aimed at maintaining situation awareness.

4 Design of Adaptive Systems

In addition to traditional allocation decisions, human-machine teaming systems have seen a rise in the interest in adaptive automation, in which the task allocation decisions are made dynamically according to factors such as the state of the human operator, environment, system, or other information [14]. In these systems, the automation provides assistance to the operator on an as-needed basis. Compared with static automation, adaptive automation based on operator need has the benefit of minimizing the risks of automation (e.g. increased boredom, decreased situation awareness) while maximizing the benefits (e.g. reduced workload, increased performance) [15]. Dynamic physiological assessment has been suggested as a potential method for measuring operator need in adaptive systems [16]. Recent studies at AFIT have used discrete event simulation (DES) and physiological recording (i.e. electrocardiogram (ECG), electroencephalogram (EEG), and electrooculogram (EOG)) to explore different methods that can be used to dynamically assess operator need and inform adaptive augmentation decisions.

Giametta and Borghetti [17] recently demonstrated the effectiveness of physiologically-based assessment in remotely piloted aircraft (RPA) surveillance tasks. The group used supervised machine learning techniques to link physiological features, collected from EEG recordings to periods of additional workload caused by multi-tasking. The EEG-based classifiers were able to correctly identify periods of increased workload 80.9 and 83.4 percent of the time in two different multiple-objective RPA scenarios.

In a similar study involving the same RPA scenarios, Smith et al. [18] used IMPRINT to provide detailed user activity data to supervised machine learning techniques. Rather than using coarse task classifications labels (e.g. high workload, low workload) that were found in the majority of EEG–based classification studies, the group created continuous analytic workload profiles (CAWPs) that provided unique workload values for each second of user activity. The CAWPs enabled “detailed application and research analysis not possible with subjective measures alone” [19].

Recently, Giametta [20] used stochastic simulation techniques to overcome the real-world challenges of creating second-by-second CAWPs for dynamic user tasks. His work showed that representative workload profiles could be crafted using CAWPs from a small set of previously observed subjects performing similar tasks. After identifying and observing user activities that governed task difficulty and timing in RPA simulations, stochastic variables were fit to distributions that matched each of the observed actions then resampled multiple times to create representative workload profiles. Supervised machine learning models were then calibrated for new subjects by collecting only EEG data during training tasks, then pairing them with the previously created representative profiles.

Studies at AFIT continue to focus on assessing operator need using physiological data. DES has given their researchers a unique view of operator workload during dynamic tasks. This has allowed them to link physiological features to workload in multiple military scenarios, which will ultimately enable the design of effective adaptive systems.

5 Interface Design

The effectiveness of the human-machine team is highly dependent upon the mechanism that the human and machine use to communicate with each other—the interface. Poorly designed interfaces could cause either teammate to misinterpret the goals, intentions, or information being conveyed by the other. Simulation enables designers to explore a range of interface design options, across a variety of design detail levels.

Work by Rusnock and Geiger [21] used human-performance simulation to model the cognitive workload of an operator performing intelligence, surveillance, and reconnaissance tasks using remotely-controlled unmanned ground and aerial systems. The study evaluated the relative performance and expected workload for three types of interface redesigns for the control system: keyboard, voice recognition, and touch screen. Through this evaluation the study was able to find the alternative that offered both workload and performance improvements.

Goodman, et al. [22] evaluated the timing of automation intervention route-generation aid in an Air Traffic Control-style task. A DES, validated by human-in-the-loop experimentation, revealed that agent intervention timing has a significant impact on human behavior, workload, and team performance. Further simulations suggested agent intervention timing should not be static, but rather a function of environmental event rates.

Kim [23] used simulation to evaluate the use of a 3D audio interface for multiple aircraft control. The aims of this research were to see if the workload and performance of an operator responsible for tracking radio traffic of multiple unmanned aircraft could be improved through the use of a 3D audio system over the current system. The 3D audio system uses a voice recognition system to automatically differentiate critical information (i.e., radio calls including the call sign of an aircraft under control) from distractive information (i.e., radio calls including other call signs) and present these information to different ears of the operator. The researchers used IMPRINT simulations to predict the effects of the 3D audio system, which showed promising effects of the 3D audio system in reducing UAV operators’ workload and response time. The simulation predicted that the operator would shed the more complex cognitive and physical tasks associated with call sign recognition and relevance determination, instead performing the simpler perceptual task. This conversion of the tasks would permit the operators to quickly distinguish critical from distractive information, thus reducing workload, when using the 3D audio system. Furthermore, one interesting expectation from the results of the simulation was that the operators’ workload and performance would not be influenced by the number of call signs the operator controls while using the 3D audio system. These simulation results were later confirmed through the use of human-subjects experimentation.

6 Assessing Situation Awareness

As automated teammates take over an increasingly larger number of tasks previously performed by humans, one major concern for human-machine teaming is the potential loss of operator situation awareness (SA). Situation awareness is the operator’s perception (or mental model) of elements in the environment within a volume of space and time, the comprehension of their meaning, and the projection of their status into the future [24, 25]. For example, the ability of a pilot to conceive of an aircraft’s whereabouts, status, weather, fuel state, terrain, and, in combat, enemy disposition is critical to effective aircraft operation. In critical phases of flight, poor weather, or in the face of systems malfunctions, appropriate SA can mean the difference between mission success and failure or even survivability [26].

Because of the critical effects situation awareness has on mission outcome and survivability, designers and operators both have a vested interest in maintaining a high level of SA. Operators develop procedures and train to maximize the use of all available tools and observations to increase SA [26]. Designers can incorporate technology specifically designed to enhance SA such as heads-up displays (HUD), multi-function displays, automation aids, expert systems, advanced avionics, and sensors that provide more information in a more useful manner [24]. Designing for enhanced SA must consider the number of required tasks, the workload of those tasks, and the information provided to the operator during their completion [27]. In order to design for SA, we must be able to predict/estimate SA during the design process.

Recently, Meyer [26] demonstrated the use of discrete-event simulation, to measure potential SA. While this simulation inherently reflects an optimistic estimation, it can model the complex relationship between operator workload and SA as the human interacts with the machine. This method is done with two separate algorithms: workload-dependent and task-dependent. The workload-dependent measure, Strategic SA, is computed from the cognitive workload experienced below the overload threshold. This corresponds with Endsley’s theory that high SA can be maintained under increasing workload conditions until approaching overload, at which point, it deteriorates rapidly [28]. Strategic SA allows the operator to have independent and unpredictable priorities for information gathering. The task-dependent measure, Tactical SA, evaluates the information gathered from accomplishing a specific task. Some tasks, such as actuating a button or switch, provide no SA, while others, such as reading an instrument or display, provide noticeable SA. Summing Tactical and Strategic SA together yields Total SA [26].

This approach was first used in a study of airlift missions comparing two different C-130 aircraft conducting both a formation airdrop, and a single-aircraft approach and landing with maximum effort procedures. The older C-130H aircraft had a cockpit crew of four (pilot, copilot, navigator, and flight engineer) and predominantly analog instrumentation, while the newer C-130 J aircraft had a cockpit crew of two (pilot and copilot) with modern digital avionics and enhanced automation features. Each cockpit crew member was modeled with discrete-event simulation to measure workload and situation awareness both as individuals and as a team. Results showed that operators were able to maintain high SA during periods of high workload, if the workload was attributed towards tasks that gained SA. Also, while the modern avionics were able to substitute for human operators, it did not result in significantly improved SA for the C-130 J in either simulation [26].

By using simulation to estimate SA, designs of human-machine teaming systems can be effectively evaluated and potential degradations in human and mission performance can be identified.

7 Assessing Trust

Another key factor in the proper application of automation in human-machine teaming is characterizing operator trust in the automation. Trust is defined as the human’s confidence that an automated system will help him/her achieve his/her goals in a situation characterized by uncertainty and vulnerability [29].

Characterizing and modeling trust in human-automation interactions is imperative for successful performance and reduced workload, because operators that distrust an automated system will not use it effectively, thus losing the expected benefits from the automation. Calibrating an operator’s trust in a system is necessary to prevent overtrust or distrust in the automation. Calibration references the relationship between an operator’s trust in the automation and the automation’s true capabilities [30]. Overtrust refers to poor calibration in which trust exceeds the automation’s capabilities and distrust refers to trust falling short of the automation’s capabilities [30].

In order to model and simulate trust in automation, trust needs to be quantified. Recent work by Boubin et al. [31] has sought to quantify two aspects of trust: reliance and compliance. Reliance pertains to the human operator’s state when an alert or alarm is silent [32]. Boubin et al. extend the definition of reliance to mean the acceptance of an automation’s non-action. Inversely, compliance addresses the operator’s response when the alarm sounds whether true or false. Boubin et al. extend the definition of compliance to mean the acceptance of an automation’s action. Using data collected from a human-in-the-loop experiment, Boubin et al. were able to create mathematical functions which model compliance and reliance based on taskload and the automation type. These functions can be used by simulation models to capture impacts of trust on human-machine team performance. This work is being extended to examine how a user’s reliance and compliance rates are affected by degraded automation reliability. The research hopes to provide further insight into how reliability can affect human operator’s performance based on the level of trust the operator has in the system.

In another experiment Goodman et al. [22] incorporated similar reliance functions accounting for task load and agent response timing into discrete event simulation models. These reliance functions were developed from previous experiments and produced a probability that the human would permit the agent to create a route. This function assumed that higher taskload would result in greater reliance. Agent response time was used to shift the function vertically, where quicker agent responses created higher probabilities of agent route creation and slower agent responses led to lower probabilities. There models were later supported by human-in-the-loop experimentation, which suggested that humans will take advantage of the opportunity to shed tasks as long as it is not detrimental to performance.

8 Assessing Fatigue

One of the many advantages of modeling is the ability to rapidly simulate human-machine teaming interactions that last for extended periods of time. Especially for military human-machine teams, it is not uncommon to have tasks that extend beyond an 8-hour work day, including night-shifts or even multi-day shifts. Over these long periods of time, it is important to account for changes in the human operator’s cognitive and physical abilities. Unlike the automated counterpart, the human operator is susceptible to declines in performance due to fatigue. Assumptions that an operator’s performance is constant, especially during extended task durations, would likely result in unrealistic over-estimates of team performance. Human cognitive and physical abilities degrade as time without sleep increases. This degradation causes can result in an increase in task times, error rates, and dropped tasks.

In order to effectively capture human-performance under fatigue conditions, it is necessary to create models which account for increased task failure and task time due to these fatigue conditions. Recent work at AFIT has built upon previous fatigue models, developed by Gunzelmann and Gluck [33] using ACT-R, to incorporate fatigue mechanisms which account for performance declines due to microsleeps. A microsleep describes a very short period of time where no cognition occurs (i.e. temporary episode of sleep lasting a fraction of a second). To properly implement microsleeps into an IMPRINT simulation model, unique functions were created for each task where microsleeps may occur. These functions determine the probability of a microsleep event occurring and are based on the amount of sleeploss and the number of microsleeps that have already occurred.

Fatigue modeling research at AFIT has also examined vigilance decrement. In situations where operators have had appropriate sleep, operators can still experience fatigue during the performance of vigilance tasks. Vigilance tasks are characterized by long periods of sustained attention, commonly requiring operators to identify stimuli that are occurring at very low event rates. Work at AFIT has incorporated functions established by Giambra and Quilter [34]—which describe the relationship between vigilance task time and the increase in reaction time—into human-performance simulations in the cyber domain. This model used the vigilance decrement function to portray the workload and performance effects of fatigue on Air Force cyber-defense operators monitoring network traffic.

By accounting for sleeplessness and vigilance decrement, modelers can have a better understanding of the performance of human operators in high stress, high-attention situations over long periods of time. By creating models which account for human fatigue, system designers are able to understand the interaction between the design and a fatigued operator, and thus more accurately account for expected operator performance. This awareness can inform designs of human-machine teaming systems, hopefully increasing effectiveness and reducing the likelihood of errors due to fatigue.

9 Conclusions

Simulation has the potential to play a crucial role in the design and development of human-automation systems. By not requiring physical prototypes or human subjects, simulation provides a safe, affordable, and time-effective, mechanism for evaluating system designs. These advantages make it easier to perform what-if analyses and explore many more design options than would be possible through live-testing. Additionally, simulation can be performed throughout all phases of the system development lifecycle, enabling early design decisions and trade-offs to account for human performance and human-machine interactions. The body of work described herein demonstrates how simulation is being used to capture complex human-machine interactions and to identify system-level, emergent performance outcomes which account for human qualities such as mental workload, situation awareness, trust, and fatigue. Properly accounting for human performance during system design and development will ultimately result in more effective human-machine teams.

10 Disclaimer

The views expressed in this paper are those of the authors and do not reflect the official policy or position of the United States Air Force, the Department of Defense, or the U.S. Government.