Keywords

1 Introduction

Aviation Full Flight Simulators can record and playback almost every element of a simulation to aid instruction. What cannot be monitored is where a pilot is looking. From their very first flight, all pilots will, at some time, conduct insufficient monitoring and instrument checks. An instructor’s inability to identify a lack of checking can lead to a normalization of deviance towards bad practice. As a pilot looks forwards in a simulator their face generally cannot be seen and their scanning accuracy cannot be monitored by an instructor if no subsequent failure or omission occurs. At present there are no aviation simulators that use ET technology to aid instruction despite countless articles espousing its benefits. Also notable is no research investigates how instructors respond to this technology. What use is all the data in the world, if it cannot be effectively interpreted? This paper discusses the importance of checking and scanning and then, through taking the instructor’s perspective, investigates whether ET of student pilots could be used to assist simulator instructors in identifying and enhancing their awareness of student errors. It identifies whether instructors maintain inaccurate preconceptions over what errors they are able to monitor.

1.1 Why Do Pilots Need a Disciplined Check or Scan Regime?

Poor visual scanning signals a split between the pilot and the automated system that they are operating. The purpose of a systematic instrument check is to ensure that the pilot receives acceptably fresh information with a sufficient refresh rate to maintain a consistent level of SA. When a scan breaks down, a pilot’s mental model of the aircraft’s mode state or geospatial and temporal position is obsolete.

Errors or breakdowns in scanning and checks are invariably caused through: Insufficient capacity leading to reduction in Situational Awareness (SA); a distraction leading to subconscious re-prioritization; or, a fundamental misunderstanding of what is required. The latter, requires training and re-education and forms basis of this paper. The differing scan pattern failures can be split into four types. (1) Incorrect scan. This mis-interpretation of what is required may originate from poor initial training, a lack of competence, mental temporal or geo-spatial displacement from a loss of SA or that they have forgotten due to lack of recent exposure. It has also been shown that inexperience can lead to a change in scan pattern [3]. (2) Degraded scan pattern. All pilots are susceptible to this monitoring degradation and the brain is quick to prioritize what it considers worthy of attention in a scan [4]. Pilots who have not employed the scan in a while invariably scan at a slower rate than normal. (3) Non-existent scan. A complete failure to check a data source through distraction, re-prioritization or being unaware of the requirement. (4) Insufficient or inappropriate scan. When pilots drop elements of their scan, they often do not remember dropping them unless they are triggered to do so. Pre-conditioning from years of repetition, can also lead a pilot to think they have conducted a check when in fact they have not. They may think they recall the parameters, and in these cases they may only be recalling scanning the source; hence, pilots can gain a false impression of their own ability to monitor [4]. It has also been shown that the more inexperienced pilots are, the longer the dwell time they have with less regular fixations [1, 5] or less relevant fixations [6]. This can lead to the scan failing to be either efficient or effective largely because it leaves no time to perceive the data, let alone comprehend it [7].

There has been some thought as to why inexperienced pilots struggle in this manner. Airbus direct pilots to always use the highest level of automation available, and modern flight decks rarely require pilots to conduct raw data approaches. This can lead to a ‘misuse’ or over reliance of automation and, in turn, a poor scan technique [8]. Primary Flying Display (PFA) scan technique is not taught in many organizations whose aircraft may be a trainee’s first exposure to a PFD. Unless they have trained on a modern glass cockpit aircraft, it is not possible to categorically know that the trainee has developed the correct scan technique. Moreover, with modern aircraft, the pilot arguably manages the system, rather than manually ‘flies’ it in the traditional sense. For this reason, it is essential that they know exactly what mode the airplane is in, something that can only categorically be known by a check of the mode annunciator.

To date the most researched function of pilot eye movement analysed is the scan pattern. Pilot scans can break down [8], and different pilots have differing scan rates [3, 9]. ET is a powerful tool that in these examples provided us with this information.

1.2 Identifying Poor Checking and Scanning in the Training Environment

When a student fails to carry out a manual action despite confirming that they have, it is clear to a vigilant instructor. When a student fails to check a data source despite having said that they have, it is less obvious. As they can only rely on head movement and perceived direction of gaze, a perennial and as yet unaddressed problem is that of instructors not being able to identify this failure. If there is no subsequent impact, this poor monitoring will go further unnoticed. This can detrimentally re-enforce bad practice through negative transfer [10], increasing the chance that the next time there is a mode or source of information contrary to that which is required, it will not be spotted. More often than not, as no adverse consequences transpire, pilots may not even realize their monitoring is inadequate [11]. Relaxation can develop, leading to normalized deviance from the correct SOPs.

Training a pilot involves the introduction of new information and techniques, followed by a period of exposure and practice. Part of the training process is to capture any poor techniques before they lead to errors. The check or scanning error that occurs due to a fundamental misunderstanding of what is required presents an insidious threat to flight. In some circumstances these issues cannot be identified with current training tools and the latent error may be repeated hundreds of times over many years, with little or no impact - until the right combination of triggers create an accident opportunity [12].

At present, using level D simulators, instructors are able to monitor and, through DBFs, graphically reproduce vast amounts of the simulation in real time. Unfortunately, DBFs are rare, but where they are available, the recorded session can be played back. The student will have a mental model of both their performance and what the correct performance should be; the difference between these two models may be further apart than the student realizes and verbal analysis may not bridge the gap. If the student understands the correct theory, video playback facilitates the student in contextualizing and adjusting their own actions in line with their own mental model.

1.3 Eye Tracking in Flying Training

Legacy, rudimentary methods of observing where a student is looking involve using a system of mirrors however, the analyst is not able to pinpoint exact fixation points, merely the rough direction of viewing [2]. It is only possible to identify an incorrect scan if parameters fall or remain outside limits. A system with the ability to monitor exact pupil movement has utility in the aviation training environment. We know that ET is a useful aid to help train pilots, but few studies address the manner in which this training should take place, providing practical, repeatable and employable techniques that can be incorporated into the simulator training suit. What aviation simulators and DBFs do not currently offer is the ability to monitor where a pilot is looking. Integrating ET data with the DBF for the purpose of identifying where the trainees are looking and enhancing the de-brief, would create an exceptionally powerful training tool.

1.4 Aim and Objective

This study integrates pilot ET data with an aviation simulator DBF. The aim was to investigate whether ET made a significant difference to the identification of pilot errors in ground-based pilot training. To do this the objective was to ascertain whether simulator instructors, both with and without the use of ET data, could identify whether a pilot was checking what they should be at the correct moment. Moreover, it was necessary to understand whether exposure to ET data changed instructors’ pre-conceptions about how many errors they were able to see.

2 Method

2.1 Target Sample

19 simulator instructors and examiners took part in the research. All subjects were either UK CAA Type Rating Instructors (TRI) or Examiners (TRE) or they were military Line Training and/or Air to Air Re-fueling instructors. The instructors were requested to support the research but were told no more about would be involved. From a total of 33 company instructors, the participation rate of 54.2%, easily ensured a probability of 0.95 that the number of subjects obtained was within ±0.1 SD of the true population mean. Demographic data: Gender (all men); age in years (M = 48.1, SD = 9.1); total flying hours (M = 9379.0, SD = 4708.7); total hours on an A330/Voyager (M = 2484.2, SD = 1790.8); and, total years as an A330/Voyager instructor (M = 6.38, SD = 6.01). Note: The Voyager is a military variant of the A330-200.

2.2 Experiment Hardware

Simulator.

3 pre-recorded profiles were made in a Thales, A330-200 Level D sim.

Eye Tracking.

ET data was recorded using the Pupil Lab Eye Tracker.

Simulator DBF.

The Thales DBF, shown at Fig. 1, comprises a desktop computer and combination of monitors and speakers which are linked to the flight simulator. The instructor is able to record parts or all of the simulator session, managed from their seat in the simulator. These recordings can be played back on the DBF. The DBF reproduces the flying displays, all sound and representations of the controls for the spoilers, flaps, thrust levers and landing gear. Additional information includes rearward facing camera images, flying control position data, SatCom data and a recording management window. A fourth monitor off screen to the right displayed an ATC radar picture of the aircraft. The research pre-recordings were stored on the DBF database.

Fig. 1.
figure 1

Thales De-Brief Facility, in Voyager configuration, in addition to eye tracking data.

Questionnaire and Measure.

26 questions required a mix of answers using Likert-type scale and free description. The questionnaire was answered in three Phases: 1 - pre-DBF exposure; 2 - post-DBF exposure (without ET augmentation); and 3 - post-DBF exposure (with ET augmentation). The 3 Phases of questions sought to establish the subjects’ opinions on the efficacy of the simulator and DBF as environments in which to identify pilot errors and omissions.

2.3 Experimental Design

During the pre-recording of the 3 simulator profiles, an ET device, linked to a laptop controlling the ET program, was worn by the right-hand seat pilot (PM). At pre-briefed moments and contrary to Standard Operating Procedures (SOPs), the PM deliberately either avoided checking a necessary instrument or fixated for too long on one. For the data gather, each subject was seated at a table in front of the DBF screens. Subjects first answered Phase 1 of the questionnaire and watched three profiles on the DBF. Subjects then answered Phase 2 of the questionnaire and in turn watched the three profiles again, this time with ET data run in parallel from the same starting point. Subjects finally answer Phase 3 of the questionnaire. As the data available in each phase developed, certain question sets were repeated, capturing any changes of opinions.

ET was not mentioned until just prior to the ET exposure to avoid preconception bias and to ascertain the specific areas of instruction that each subject was focused on. The incorporation of the ET data was presented to mimic having it integrated into the DBF, see Fig. 1. The two exposures were the independent variables and the questions were the dependant variables. Figure 2 shows the additional data that ET can augment a DBF with.

Fig. 2.
figure 2

Data available from ET.

3 Results

3.1 Challenging Pre-conceptions

The difference between ‘How many of a pilot’s actions the subjects thought they could monitor during simulation, was tested using a repeated-measure ANOVA to examine instructors’ responses in Phases 1, 2 and 3. It was found that there was no significant difference between instructors’ responses, F(2,36) = 0.368, p = 0.694, \( \upeta_{\text{p}}^{2} = 0.02 \). The null hypothesis that ‘there is no significant difference among these three scenarios’ can be accepted. Next, the difference between ‘How many of a pilot’s actions the subjects thought they could review in the DBF’, was tested. A repeated-measure ANOVA was also conducted and it was found that there was a significant difference of instructor response to the 3 different Phases, F(2,36) = 8.348, p < 0.05, \( \upeta_{\text{p}}^{2} = 0.317 \). Phase 1–3 rating scale means and question decodes are at Fig. 3. The null hypothesis that ‘there is no significant difference among these three scenarios’ can be rejected. ANOVA results for questions relating to DBF review are at Table 1. Additionally, a post-hoc comparison by Tukey HSD indicated that there was a significant difference between Phase 1 (M = 6.95, SD = 1.43) and Phase 3 (M = 7.53, SD = 1.31). There was also a significant difference between Phase 2 (M = 6.47, SD = 1.47) and Phase 3 (M = 7.53, SD = 1.31), see Table 2.

Fig. 3.
figure 3

Phase 1–3 rating scale Means for the number of a pilot’s actions the subjects thought they could monitor during simulation and in DBF review.

Table 1. Summary of within-subjects ANOVA
Table 2. Summary of post-hoc Tukey HSD for comparable means.

Another repeated-measure ANOVA was conducted for responses to the three Phases to test the difference between questions 7, 12 and 12C. 12C was the answer that some instructors chose to change Q12 to when given the option in Phase 3. In other words, ‘now they had been made aware of some of the errors and actions they had been missing, did they wish to revise their previous rating?’. This revision did not suggest ET augmentation was now available but sought to challenge instructors’ pre-conceptions over their efficacy of their monitoring.

Again, there is a significant difference of instructor pilots’ response, F(2,36) = 7.448, p < 0.05, \( \upeta_{\text{p}}^{2} = 0.293 \). The null hypothesis that ‘there is no significant difference among these three scenarios’ can be rejected. A post-hoc comparison by Tukey HSD indicated that there was significant difference between Phase 1 (M = 6.95, SD = 1.43) and Phase 3C (M = 5.79, SD = 1.62).

By showing subjects what they had missed, this challenged their pre-conceptions of their ability to identify error using the DBF, and changed their opinion.

3.2 ET as an Instructional Tool

When rating how effective the subjects thought ET would be in improving an instructor’s ability to monitor correct SOPs and scan patterns, responses in Phase 2 (M = 7.37, SD = 1.21) rose in Phase 3 (M = 7.68, SD = 1.25) after exposure to ET data. Subjects then rated the efficacy of the ET data first as a stand-alone tool (M = 6.68, SD = 1.60) and then when used in conjunction with the DBF (M = 7.89, SD = 1.29), a notable increase. Finally, subjects were asked a standalone question to assess how effectively they thought they were at integrating ET data with the DBF data, (M = 6.79, SD = 1.03).

Pilot Error Analysis.

Errors identified by the subjects in profiles 1, 2 and 3 were noted down in Phases 2 and 3. Across all subjects, 42 separate error types were identified from a total of 93 individual observations, before the addition of ET data. Errors included ‘not setting Missed approach alt’, ‘not checking approach lane prior to line up’ or ‘Side Stick (SS) in wrong position at take-off’. After the introduction of ET, 23 additional error types were identified from a total of 88 individual observations. Notably, from the 23 additional errors identified in Phase 3, only one was also captured in Phase 2. Out of a total of 64 separate errors, 22 new errors (34.4%) were identified with ET augmentation. The number of error observations in Phase 2 (M = 4.74, SD = 1.58) was 11.3% of the Phase 2 total observations rising in Phase 3 (M = 5.84, SD = 2.33) to 25.4% of the Phase 3 total. It could also be seen that every new error identified could be directly attributed to the availability of additional ET data.

Pilot Action Analysis.

Pilot actions that instructors reported not being able to see were captured both before and after the introduction of ET from 52 individual observations. A total of 15 separate actions were identified across all subjects before the addition of ET data and 9 separate actions after the introduction of ET. Only 2 of the 9 actions from Phase 3 were mentioned in Phase 2. From this we see that out of a total of 24 separate actions, 7 new actions (29.1%) were introduced following the ET augmentation. Of the 15 separate actions ‘not seen by the instructor’ in Phase 2, 13 of the 31 observations (42%) would have been remedied by ET. Additionally, two subjects did not consider that there were any actions that they did not see, although when they had seen the ET they were able to identify some. In Phase 3, having seen the ET, there were 21 additional observations added, of which 7 actions ‘not seen by the instructor’ had not previously been mentioned. This showed that following exposure to ET, instructors further identified an additional 47% actions not seen. Of these, 71% would be solved by ET.

Improving the Current Information Sources.

Table 3 shows the methods that instructors believed would increase the level of relevant information available to them, as noted before the introduction of ET data. The actions highlighted orange are those that ET would remedy, 45.5% of the total methods.

Table 3. Methods to improve information availability.

4 Discussion

ET has been successfully implemented in training before [9], however this study was able to robustly support the hypothesis that ET made a significant difference to an instructor’s identification of pilot errors in ground based flying training. This paper does not attempt to address more specific outputs from ET data such as dwell time, lack of fixations or random scan patterns. There has been much research on these subjects and it is known that pilot weakness in these areas can be identified through ET. What appears to have been ignored is understanding whether instructors are able to process, interpret and utilize this data; moreover, whether its integration within DBFs is accessible and of positive benefit.

Challenging Pre-conceptions.

Q6, 11 and 19 related to errors identified during simulation. Whilst showing the subjects DBF data directly affected their opinion relating to Q7, 12 and 20, this opinion had to be inferred when considering simulation – this is discussed further under ‘ET as an instructional tool’. For Q6, 11 and 19, ratings indicated that, even with the exposure to the DBF, subjects were content with their assessment of their ability to spot errors. It was assumed that with the additional ET exposure in Phase 3 and having noted down the considerable numbers of additional errors that they missed in Phase 2, they would again reduce their rating assessment of the errors they were able to identify. Instead, the Phase 3 mean stayed constant. This demonstrates that the ET data exposure in the DBF did not alter the instructors’ pre-conceptions of what errors they were able to identify in the simulator.

Q7, 12 and 20 related to errors identified using the DBF. Having conducted the research it was ascertained that many of the subjects did not routinely use the DBF for their instructional de-brief. For some, they were contractors that also instructed for other companies. A DBF is a rare commodity and they only usually got access to one at RAF Brize Norton. Others did not consider that the DBF added enough weight to a de-brief to justify its setup and use. For these reasons, they were not practiced in its functionality and refrained from using it. In conversation only around 5 (26%) of the instructors stated that they used it regularly, notably those with more experience. As it was known that so few instructors used the DBF to augment de-brief, it was necessary to track their evaluation of its worth, both before and after seeing it in action. The mean rating reduction of 0.48 between Phase 1 and 2, t(19) = 1.92, p = 0.07, indicated that instructors felt they were now less able to identify errors, having just seen the DBF in action. An explanation for this fall is that on exposure to the DBF and the excess of information that is available, instructors felt unable to monitor all the data at once and were thus prone to missing errors. By adding ET data, subjects’ mean rating score increased from Phase 2 by 11.8%, to 84.7% of maximum rating, evidence that instructors recognized and accepted missed errors that they had previously been unaware of. This strongly validated the positive effect that ET has in helping to identify errors in training. When given the option to revise their Phase 2 rating, based on what they now knew, Phase 1 to Phase 2 gave a statistically significant drop; t(19) = 4.01, p = <0.05. This change demonstrated that the exposure to ET data significantly altered the instructors’ preconceptions of what errors they were able to identify using the DBF.

Eye Tracking as an Instructional Tool.

Instructors rated the use of ET as a stand-alone instructional de-brief tool, when considering it to be the sole source of information. This question assessed the value the subjects gave to the ET data. A 74% mean rating response rose to an 88% when consideration was given to ET being used in conjunction with the DBF i.e. Phase 3. It can be seen that instructors assessed that the combination of both sources of data was a powerful training tool.

Instructor Error Awareness.

It has been shown that the instructors’ awareness of the limitations of the simulator and DBF are broadly aligned. In Phase 2 the number one perceived issue related to SS input visibility. This problem, unique predominantly to Airbus, was a contributory factor during the Air France 447 A330 crash, where neither pilot could see each other’s SS, exacerbating a lack of awareness of their opposing inputs [13]. Although not related to ET, it is interesting to note that detractors of Airbus often cite the SS as being a negative of the cockpit design and the most high-profile Airbus accident in the last few years was linked to an inability to see SS inputs. It is quite possible that that this knowledge drives the SS issue to the forefront of instructors’ minds. The issue with the most overall observations related to tracking of gaze and scanning. Even before the introduction of ET data, the number of ‘cannot see where eyes are looking’ observations, in addition to the many other associated eye gaze issues, demonstrated that not being able to see what the trainee is looking at is a concern within the instructor cadre.

Error Response.

The unusual scenario of watching the DBF without first having sat through the simulator profile presented some compelling data. Phases 2 and 3 exposed all instructors to exactly the same profiles. In phase 2 the instructors were not able to see the scanning and checking errors, therefore these was not necessarily their focus. Their standard brief asked them to identify the recorded pilots’ compliance with SOPs, leading them to seek other mistakes. During the pre-recording and despite their best efforts, minor errors were made by the pilots in both setting up the flight deck and performing the profiles. In Phase 2, each subject identified a mean of 4.74 (SD = 1.58) out of a total of 42 errors, demonstrating that instructors seek different sources of information when un-prompted. It shows that different things are important to different people at different times. It is difficult to explain these varying foci however, they would suggest that the multiple sources of information that individuals have exposure to in their daily life have mentally primed them differently. They potentially have a cognitive bias towards certain errors and hence seek them out when their attention is not targeted elsewhere. Support for this came when interrogating actions that instructors did not think they could see; their primary focus in Phase 2 was on the SS, potentially due to the Air France 447 accident twinned with Airbus’ use of a SS. From the number of observations, we also see that in Phase 2, where there is no specific focus to the error identification, the ratio of observations to separate error types is 2.21 to 1. After the introduction of ET, where error identification becomes focused, the ratio of observations to separate error types is 3.83 to 1. This clearly demonstrates that with targeted training, we can align instructor focus and increase identification of error.

5 Conclusion

This research conclusively shows that using ET allowed instructors to spot increased numbers of errors. As trainees look forwards in the simulator, their face generally cannot be seen. When they incorrectly scan and poorly monitor their errors cannot be spotted by an instructor if no subsequent failure or omission occurs. Integrating ET information was considered challenging due to the additional volume of data however, it is thought to significantly improve the DBF capability, creating additional training opportunities for students. It also shown that if conducting training using the DBF, the target of instruction must be focused otherwise error identification is random. ET’s key advantage however, is that is measurably focuses and enhances instructor attention on identifying checking, scanning, and monitoring errors. Observing the DBF, both with and without ET augmentation, increases instructor understanding of what they are unable to identify in both simulator and DBF. ET even changed their pre-conceptions regarding the efficacy of their trainee monitoring, reducing their levels of false confidence and educating them on how they could be better.