Keywords

1 Introduction

1.1 Project Background

This project has origins in the mid-2000’s, when researchers started to focus on the development of distributed simulation tools that enabled interaction between ground controllers and aircrew in flight simulators over long distances. Research examining these interactions was accomplished by geographically distributed teams occupying multiple facilities, alleviating the need for participating researchers to be co-located [1]. The present study followed a similar approach by leveraging existing distributed simulation capabilities at the University of Iowa Operator Performance Lab (OPL) and the California State University Long Beach Center for Human Factors in Advanced Aeronautics Technologies (CHAAT). This federation of simulations involved interaction of aircrew flying the Boeing 737-800 simulator at OPL with the simulation manager, confederate dispatcher, and air traffic controller at CHAAT [2]. The simulator at OPL is equipped with data recording capabilities that are specialized to assess new technologies, emphasizing aviation-specific human factors considerations. The ground station was based at California State University – Long Beach in the CHAAT lab.

1.2 Study Outline

The Human Autonomy Teaming (HAT) software tool suite, installed on a Surface Pro tablet in the aircraft simulator (Fig. 1), provided pilots with semi-automated electronic checklist, audio/voice cues and commands, as well as provided alternate airport destinations in cases of emergencies.

Fig. 1.
figure 1

Simulator configuration: Boeing 737-800 cockpit with HAT software running on surface pro located left of the captain seat

One of the features of the HAT software tool suite was a traffic display developed at NASA called the Traffic Situation Display (TSD). The TSD would allow a pilot to see their own aircraft with respect to nearby airports and other traffic in the area. A control panel to the left of the TSD would allow a pilot to zoom in and out as well as pan around the display to view other regions as necessary to facilitate any required or desired planning by the pilot. A screenshot of the tablet on the TSD page is shown in Fig. 2.

Fig. 2.
figure 2

Traffic Situation Display (TSD)

A large portion of the HAT tool visual interface consisted of software developed by NASA called the Autonomous Constrained Flight Planner (ACFP). This tool was previously developed and tested in ground-based flight-following operations for controllers on the ground supporting multiple aircraft under their watch [3]. The ACFP in the HAT condition provided input for the pilot to make adjustments to criteria of how the algorithm selects alternate airports/runways. These ‘weights’ are located on sliders as seen in Fig. 3. A pilot can move the weights along the slider controls to adjust what criteria should receive priority when the automation selects top alternate airport candidates to present to the pilot.

Fig. 3.
figure 3

Slider controls can be found in the bottom left corner of this screenshot, allowing the pilot to adjust the importance of different criteria in ACFP

The automated checklist feature of HAT supported the study aircrew in accomplishing pre-selected tasks for off-nominal events that would normally be accomplished manually. One goal of automating these tasks is to decrease the overall workload of the flight crew, allowing more attention to be devoted to aircraft control. A potential long-term outcome of transitioning tasks, normally accomplished by human crewmembers, to automation may be a reduction in the number of crew members required in the cockpit.

The HAT tool also provided the pilots with a mechanism to use voice commands to navigate the application, thus freeing their hands to perform other tasks in the cockpit. Since voice interaction is not a standardized feature in cockpits today, a laminated quick reference card was provided to pilots to use as reference should they need to consult it at any time during the simulations. This voice command feature enabled the pilot to call up approach plates and airport maps of various destinations as well as switch between screens, or tabs.

1.3 Participant Demographics

Twelve pilots participated in the study. All twelve were active (not retired) and at the time were flying for a major US airline. All had current qualification type in Boeing aircraft (five in the Boeing 777, three for the Boeing 787, two for the 757/767, one for the 737, and one for the 747). Seven pilots had more than 10,000 h of flight time and five had less than 10,000 h. Pilot flight experience was not a part of the experimental design but is reported herein, because it produced interesting results relating to the impact of experience (see Sect. 3) (Fig. 4).

Fig. 4.
figure 4

Total hours flown as line pilot

2 Scenarios and Conditions

A variety of scenarios containing an off-nominal event of three levels of severity were designed to stimulate the pilots to use the HAT tool and procedures. The three levels of severity were: Light (L), Moderate (M), and Severe (S). A table enumerating the six different non-normal events and their designated level of severity are shown in Table 1 below. Typically, a scenario lasted for 15–20 min, although depending on pilot action, a few went longer than 30 min.

Table 1. Off-nominal events

The scenarios were varied such that no pilot saw the same scenario more than once. Additionally, the scenarios were counterbalanced so that all appeared an equal number of times, both with and without the HAT tool to allow comparisons of subjective workload and situation awareness ratings. Each scenario appeared in the HAT and No-HAT conditions six times. Pilot participants experienced all six scenarios over the course of their day in the simulator (three in the morning and three in the afternoon). For each pilot, the scenarios were grouped by HAT condition, meaning they completed the three HAT and the three No-HAT scenarios consecutively (i.e. there was no alternating between conditions).

Pilots 1, 3, 5, 7, 9, and 11 completed three scenarios with the HAT condition first and then three scenarios without HAT. Pilots 2, 4, 6, 8, 10, and 12 completed three scenarios without HAT first and then three scenarios with HAT. This alternating order between subjects was designed help minimize impact order and learning effects. The experimental matrix for the study can be found in Fig. 5.

Fig. 5.
figure 5

Experimental run matrix for all 12 pilot participants (PPT) in this study

3 Subjective Responses

For every simulator scenario, pilots completed a NASA Task-Load Index (TLX) [4] and 10-Dimensional Situation Awareness Rating Technique (SART) [5] questionnaires to capture their subjective workload and situational awareness ratings. Questionnaires were administered with paper and pen immediately following scenario completion.

A General Linear Model (GLM) Analysis of Variance (ANOVA) was performed on the TLX and SART scores to investigate effects of condition and scenario on these variables. Condition (HAT vs. No HAT) had no significant effect on TLX ratings (F1,54 = 2.19, p = 0.145) or SART scores (F1,53 = 0.23, p = 0.630), while scenario (L1-2, M1-2, S1-2) showed a significant main effect on TLX ratings (F5,54 = 7.24, p < 0.005). Post-hoc Tukey pairwise comparisons showed that the L1 scenario produced significantly lower scores than the S1 (t = 4.284, p < 0.005) and S2 (t = 3.269, p = 0.022) scenarios and the L2 scenario produced significantly lower scores S1 scenario (t = 3.221, p = 0.025). The M1 scenario produced lower scores than the S1 (t = 4.622, p < 0.005) and S2 (t = 3.608, p = 0.008) scenarios, while the M2 scenario produced lower scores than the S1 (t = 3.978, p < 0.005) and S2 (t = 2.963, p = 0.049) scenarios. The effect of scenario was not of practical significance, as this was by design. It did, however, validate that scenario difficulty level generally met the intent of the experiment.

As discussed previously, post-hoc analysis suggested experience was an important consideration in interpreting the results. Splitting the pilots into two groups, there were five pilots who had fewer than 10,000 h and seven pilots who had more than 10,000 h of total flight time. A review of the data using cumulative histograms indicated that workload and situation awareness may have shown more variation resulting from the presence or absence of HAT when considering overall flying experience. Due to the imbalance of experience levels, a non-repeated measures GLM ANOVA, considering the effects of condition and scenario, was performed to investigate the impact of total flying time on TLX and SART scores. Results indicated that total flying time had a significant effect on TLX score (F1,64 = 5.22, p = 0.026). A post-hoc Tukey pairwise comparison showed that the group with higher total flying time had significantly lower scores overall (t = −2.285, p = 0.0257). The ANOVA did not show a significant effect of total flying time on SART score (F1,63 = 1.98, p = 0.165).

Figures 6, 7, 8 and 9 show boxplots and empirical CDFs of SART and TLX scores broken down by condition and experience groups. A qualitative examination of these graphs indicates several interesting findings. In Figs. 6 and 7, the differences in SART are slightly more prevalent in the high total flight time group than the low flight time group. The HAT condition, within the higher flight time group, produced the highest SART scores. As Fig. 7 shows, this difference is most evident in the higher percentiles of SART score (75th and above).

Fig. 6.
figure 6

SART scores broken down by condition (HAT vs. No HAT) and number of hours flown (more than vs. less than 10,000 h)

Fig. 7.
figure 7

Empirical CDF of SART scores broken down by condition (HAT vs. No HAT) and number of hours flown (more than vs. less than 10,000 h)

Fig. 8.
figure 8

TLX scores broken down by condition (HAT vs. No HAT) and number of hours flown (more than vs. less than 10,000 h)

Fig. 9.
figure 9

Empirical CDF of TLX total score broken down by condition (HAT vs. No HAT) and number of hours flown (more than vs. less than 10,000 h)

Workload using the NASATLX (Figs. 8 and 9), on the other hand, indicated different effects of condition on the two experience groups. Overall, TLX scores were lower for the group with higher flight time. Within this group, TLX scores were highest in the HAT condition. The separation was greatest when TLX scores were highest (i.e. 75th – 90th percentiles). Within the low experience group, the HAT condition also produced generally higher TLX scores. Contrary to the group with higher flight time, there was greater separation in the lower percentiles (10th – 25th).

4 Questionnaire

Pilots completed additional questionnaires (developed by the research team) after both HAT and non-HAT scenarios designed to elicit information from them regarding various workload/understanding aspects of the tool and procedures. There was a number scale that inquired about specific components and overall feelings about the ACFP tool within HAT. Pilots circled the number on the scale where they felt most comfortable in agreement with those statements listed in the questionnaire.

There was also a final questionnaire administered after both conditions of scenarios had been experienced by the pilots. Several of these questions asked for a preference of the condition when considering the six different scenarios that were experienced by the pilot in the simulator. To determine whether the preferences for HAT or No-HAT were statistically significant from no preference, we ran single sample t tests against the mean value of 5 (No Preference). The preference was overwhelming in favor of the HAT condition (present) in four of the five questions asked. In the remaining question (Q2, about situational awareness), the difference was non-significant. The table highlighting these questions can be found in Table 2 and Fig. 10.

Table 2. Statistical significance of Questions 1–5 of the final questionnaire comparing HAT and no HAT conditions
Fig. 10.
figure 10

Questions 1 through 5 responses on the final questionnaire across all subjects (N = 12)

5 Pilot Suggestions

Pilots also provided a number of suggestions for improvement of the HAT tool for future use. Pilot feedback was considered to be critically important as they are the end users of this tool. Among the suggestions pilots provided was increasing the reliability of the voice recognition to use natural language rather than a strict grammar set.

An implementation of the TSD screen with convective weather integrated was also a requested feature from the pilot participants. This would help paint a more comprehensive picture for the pilots to understate the situation they are in and why the ACFP may be providing the recommendations that it is. Not having an illustration of the convective weather would sometimes lead pilots into questioning why a proposed route/destination would avoid what would otherwise appear to be clear airspace.

There was also some interaction enhancements requested from pilots such as a pinch-to-zoom feature for the TSD as this is now commonplace among most tablet applications in use today. Another interaction request was to have the HAT tool suite automatically switch to the Approach or Runway tab if an airport chart was summoned by voice command and the display was not already on that tab.

6 Conclusions

Overall, the pilot’s subjective feedback in the SART, TLX, and comments provided in questionnaires indicate that the HAT tool suite and procedures may be useful cockpit tools to supplement current day operations. Many pilots commented on its usefulness as a confirmation to actions that the pilots were going to take in several of the non-normal events. An analysis of the TLX and SART scores show that there isn’t any significant detriment to workload or situational awareness with the presence of HAT in the cockpit. However, the effect of experience may warrant further evaluation. The subjective data analysis also suggests that pilots would welcome the software on an EFB or perhaps even certified avionics in the future.