Keywords

1 Introduction

Past work done regarding human performance when using enhanced vision flight systems (EFVS) for approach and landing is fairly extensive and definitive, as it has included low-fidelity simulator studies and analysis (Beier and Gemperlein 2004; Endsley et al. 2000; Todd et al. 1992; Yang and Hansman 1994), high-fidelity simulator studies (Bailey et al. 2010; Kramer et al. 2017) as well as actual flight tests (Arthur et al. 2005; Kramer et al. 2014).

However, such past workā€™s conclusions only applies to the effect of EFVS on nominal performance. That is, conclusions are drawn about the effect of EFVS on mean error or performance, but the effect of EFVS on the likelihood of deviation is not studied. This is an important gap, because it may be of interest whether the likelihood of significant deviation increases, especially if the use of EFVS in high volume commercial passenger operations becomes commonplace, whereas its use now is largely limited to cargo and business flight operations.

Unfortunately, no method has ever been applied to examine changes to the likelihood of a deviation for flight operations. In this work, a method to study this likelihood has been developed and is described. The method consists of collecting the same experimental data used in past studies, but examining the tails of the distribution of performance characteristics rather than the mean.

In this paper, the method is described and the results of the simulation experiment are provided. A discussion of those results and the recommended work going forward are also provided.

2 Method

Common statistical tests assume that the distributions of the dependent variables are symmetric, often normal, and then tests are conducted that investigate differences between the locations of the distributions, usually by tests on differences in means and/or medians. However, even similarly-shaped distributions may have nearly identical means and medians, but may differ in terms of the ā€œfatnessā€ of the tails.

Differences in the tails of distributions is potentially important, as it indicates that there are differences in the probabilities of significant departures from the center of the distribution. For example, the tails of the distribution that describes a manufacturing process can indicate how likely it is that the process produces defective products. The mean/median of that distribution only indicates the location of the distribution, which identifies the 50th percentile or the ā€œaverageā€ value.

For symmetric distributions, including normal distributions, such differences would likely appear as differences in the kurtosis of the distribution. However, if the shape of the distribution is not symmetric or is generally not well-behaved, such simple metrics may not provide sufficient insight. Moreover, even if a metric such as kurtosis is identified as different between two distributions, it is then left to determine what part of the tails are different and in what ways.

Therefore, a nonparametric, bootstrap-based method is used in this work to identify whether the areas under the tails of different probability density functions (PDFs) are significantly different or not. The particular portion of the tail to be studied is a parameter equal to the percentiles of interest, allowing one to make statistically-supported conclusions about the differences in the likelihood of specifically-large deviations from the mean of the performance parameter in question.

2.1 Bootstrapping to Compute Confidence Intervals on Percentiles of a Distribution

Simulation runs provide a sample of measures of an operatorsā€™ performance. That sample, assuming it is representative of the population, can be over-sampled to produce numerous samples. Any particular statistic of the distribution of the sample can be computed such as mean, variance, and, most importantly in this case, the nth percentile. The range of that percentile within which 90% of the samples falls is the 90% confidence interval of the percentile.

Given two samples from different conditions, a conclusion that a sample whose median nth percentile does not fall within the other sampleā€™s 90% confidence interval comes from a different population would be accurate 90% of the time. This is equivalent to parametric testing with Ī±Ā =Ā 0.10.

One determination that must be made is the number of samples to produce in the bootstrapping procedure. With few samples, the estimates may vary widely and will not form a good distribution of estimates. So ideally one creates a large number of samples. In general, since computation is inexpensive, thousands to tens of thousands of samples are generated. In practice, it is only necessary to take sufficient samples that the estimate stops changing. This can be checked by examining the convergence of the estimates, and stopping when convergence within a desired range is achieved.

2.2 Performance Parameters Measured

In simulation, one can record a large number of performance parameters. In this study, as mostly a test and demonstration of a method, we focused solely on deviation from the desired glideslope and localizer course.

One complexity is that although a simulator can record such deviations at a high rate, often as much as 60Ā Hz, each such observation is not independent from the others and therefore cannot be used as an observation for statistical testing purposes. Such observations could only be analyzed using time-series methods.

Instead, a novel method was used in this study. Deviations were compiled into error events, which have a particular duration and maximum extent (extent). An example of this is shown in Fig.Ā 1. Although there is no formal proof, such errors are likely independent from one another, and therefore the durations and extents can be considered different observations for statistical analysis purposes.

Fig.Ā 1.
figure 1

Determination of error extents and durations.

In Fig.Ā 1, each zero crossing is identified, and the regions between zero crossings are considered an error. Filtering should be done to eliminate errors of such small duration or extent that operators would not have corrected them. Duration is the time range over which the error occurred, and extent is the maximum magnitude of the deviation. Also shown in Fig.Ā 1 are the ā€œdelay,ā€ ā€œgain,ā€ and ā€œlag.ā€ Delay refers to the time from error inception to the start of a correction. Gain and lag are parameters of the operatorā€™s response, calculated from applying a McRuer crossover model to each error (Landry 2014; McRuer and Graham 1965). These latter three parameters, however, were not examined in this work.

Of interest in this work is the location of the 90th percentile of error duration and error extent, and specifically the lateral error as compared to the localizer course. These correspond to long duration error events and large error extents. The analysis can therefore shed light on whether the use of enhanced vision systems have an effect on the likelihood of having errors of long duration or of large extent.

2.3 Specific Method Used

With reference to the above detail, five participants flew instrument landing system (ILS) approaches starting approximately 10 nautical miles (NMi) from the runway threshold. Pilots had a HUD that, in the treatment condition, showed the outside view as it would if an EFVS were in use. (In the baseline condition, the HUD operated normally, with the external view in sight through the HUD.)

Moreover, the visibility on the approach was unrestricted, but in the treatment condition, visibility outside of the HUD was zero, meaning the pilots had to rely only on the HUD with EFVS to fly the approach and landing. Obscuration of the view outside of the HUD was accomplished by manually blocking the external view portion of the monitor in the treatment condition.

Of interest to this work was the aircraftā€™s horizontal position, which was compared with the desired horizontal flight path, as identified by the localizer course, to generate the participantā€™s error from the desired flight path. From this, the procedure listed above was used to evaluate differences between the treatment and control conditions.

3 Results

For error duration, the box plots for the locations of the 10th and 90th percentile lateral error durations for the treatment (obscured/EFVS on HUD) and control (visual/HUD) conditions are shown in Figs.Ā 2 and 3.

Fig.Ā 2.
figure 2

Lateral error duration boxplots.

Fig.Ā 3.
figure 3

Lateral error extent boxplots

The median 10th percentile in the treatment condition was 2Ā s, with a 90% confidence interval of [1.97, 2.98]Ā s. The median 10th percentile in the control condition was 3Ā s. This falls just outside the 90% confidence interval on the location of the 10th percentile in the control condition, but is approximately one second longer than the treatment condition.

The median 90th percentile in the treatment condition was 105Ā s, with a 90% confidence interval of [87, 175]Ā s. The median 90th percentile in the control condition was 102Ā s.

Of note is the larger number of ā€œoutlierā€ points in the treatment condition, shown as asterisks in Fig.Ā 2. Since this is a boxplot of the location of the 90th percentile error duration, these asterisks represent samples where the 90th percentile was substantially higher than the median 90th percentile within the treatment condition.

For error extent, the box plots for the error extents in the treatment (obscured/EFVS on HUD) and control (visual/HUD) conditions are shown in Fig.Ā 3.

For the 10th percentile extent, the median 10th percentile in the treatment condition is 0.0043 dots, with a 90% confidence interval of [0.0037, 0.0062] dots. The median 10th percentile in the control condition is 0.0093 dots.

For the 90th percentile extent, the median 90th percentile in the treatment condition is 0.46 dots, with a 90% confidence interval of [0.32, 0.56] dots. The median 90th percentile in the control condition is 0.79 dots.

4 Discussion

The suggestion from the error duration data is that longer duration errors are no more likely in the EFVS treatment condition than in the control condition, but that shorter duration errors are more likely in the treatment condition. However, the overall difference for the shorter duration errors is very small and is likely not of practical significance. It may, however provide support for a contention that having only the EFVS ā€œwindow,ā€ as seen through the HUD, may focus pilots on small errors.

The suggestion from the error extent data is that there are differences in error extent. Specifically, larger extent errors are less likely, and smaller extent errors are more likely, when using EFVS. While this seems like a positive effect of using EFVS, the overall error extent differences are small ā€“ on the order of one-third to one-half a dot of localizer deviation. Depending on the distance at which these differences occur, such differences are not necessarily unsafe. This raises the concern, consistent with the error duration results, that the EFVS ā€œwindowā€ focuses pilots on relatively small errors. While this may be an overall benefit from an accuracy perspective, since the errors in question are probably not significant from a safety standpoint, it may be that pilots are over-controlling, a situation that could lead to unnecessarily high workload during approach.

Overall, the suggestion from the data is that, when using EFVS, error duration is not strongly affected, but error extents are limited as compared to non-EFVS approaches. As suggested, this may be due to pilotsā€™ lack of contextual cues as to the safety of deviating from the desired flight path. If true, this suggests that EFVS could result in higher workload, although such a finding would be at odds with the subjective results of prior work using flight tests.

5 Conclusions and Future Work

The results described herein are from a small sample, using a low-fidelity simulator. Therefore, caution is advised in over-generalizing the findings. Of interest in this work was the development and test of a method for examining the probability of large and small errors, instead of the average error. Such a method was produced and demonstrated to be practical.

The results regarding the effect of EFVS on lateral error duration and extent should be viewed as somewhat anecdotal until replication, preferably using data from higher fidelity simulators and/or flight tests, are produced. Nonetheless, this work can be viewed as one set of results suggesting that EFVS may have some effect on pilot performance.