Keywords

1 Introduction

In 2017, the number of people traveling by aircraft increased to a new high of almost 4 billion [1]. Due to this increase, it becomes necessary to explore methodologies that increase efficiency while ensuring the same safety in air traffic control (ATC). A technology that supports this process is the remote surveillance for airports, also called remote tower operations (RTO) [2]. Research began on the basis of single remote tower operations and is driven by an optimal distribution of workload for the Air Traffic Control Officers (ATCOs) of smaller airports, and economic advantages. The low amount of traffic at some smaller airports can lead to under-utilized ATCOs and is, at the same time, non-profitable. Single Remote Tower, is a solution that centralizes the provision of air traffic service for several aerodromes in one center, as already implemented at Leipzig airport [3]. Multiple Remote Tower Operations (MRTO) take this approach even a step further by combining several aerodromes in one workstation.

The concept of MRTO is not only a more efficient way of managing the costly time resources of an ATCO but also reduces the danger of boredom [4] and sleepiness by providing a continuous workload [5]. In the context of this paper, MRTO is defined as an ATCO controlling several aerodromes at the same time, in contrast to controlling several aerodromes after each other. The workplace, described in detail below, allows for monitoring and managing up to three aerodromes via live video, radar, planning tool, and radio communication.

From a human factors perspective, the particular MRTO challenge is to maintain a separate mental picture for each aerodrome and quickly change between them [6]. As described above, this concept can reduce the risk of boredom but at the same time workload has to stay on a manageable level. Since workload has a connection to performance, see summary [7], the identification of methodologies that support the estimation of workload are important.

Workload is the strain or impact of stressors on the ATCO, depending on his/her abilities, resources etc. [8, 9]. An ATCO’s workload is expected to be of a cognitive quality [10] and is therefore specified in this paper as mental workload [11]. Studies [12,13,14] in the area of measuring mental workload in ATC use subjective measurements for assessment. A common online measurement is the Instantaneous Self-Assessment (ISA) Scale [15], that evaluates the mental workload in real-time simulations using a five-point rating scale [16, 17]. Besides its subjectivity, the disadvantage of the ISA scale is the intrusiveness that will repeatedly interrupt ATCO during the task performance.

Promising alternatives that use objective measurements are the recording of electrophysiological activities (EEG) [e.g., 18,19,20], the structured analysis of the traffic situation [e.g., 21] or the application of an eye-tracking metric (dwell time, fixation duration, pupil dilation) [e.g., 22, 23]. Even though the most common connection between workload and eye-tracking metrics is pupil dilation [24,25,26], the focus of this paper is on dwell time, fixation duration, and transition frequency, because the pupillary response can also be influenced by external stimuli (e.g., lighting conditions) [24] that cannot be controlled in the out-the-window view of a tower or in this case panoramic view (PV) of a remote tower. Dwell time, fixation duration, and transition frequency could been shown as indicators for workload in other domains [26, 27]. Therefore, a priority of this paper is examining the application of the existing metrics to the MRTO environment. If successful this could provide a basis for future developments.

With the increase of automation in the ATC domain the ATCO’s behavior might be the only direct input into the system that can be measured in the future. In consideration of the research presented above, the authors believe that, at present, the evaluation of eye-tracking metrics as an indicator for mental workload is the most promising approach. However, to the authors’ knowledge, there is no study that correlates the subjective and objective measurements in a realistic scenario and in the context of MRTO. Therefore, the application of selected eye-tracking metrics as an objective measurement for mental workload as extension or replacement of the subjective workload ratings would provide a good foundation for future MRTO.

Research on the connection between mental workload and eye-tracking metrics has already been conducted in other ATM domains such as cockpit [28], or weather displays [29]. Previous studies in MRTO [5] explored the ATCO performance in relation to Traffic Volume and Traffic Complexity in a real-time simulation environment. The present study focuses on eye-tracking metrics as possible objective indicators for mental workload during MRTO. It is part of an exploratory research with different interface designs and workload configurations with the aim to identify possible influences of the workplace and evaluate subjective against objective measurements. In connection to workload, the three eye tracking metrics, fixation frequency and average duration and average transition frequency were selected [27] to extend the result to the MRTO environment. In collaboration with the air navigation service providers Hungaro Control (HC), Oro Navigacija (ON), and Frequentis, an ATC system manufacturer, the German Aerospace Center (DLR) conducted this study [30].

2 Research Questions

Based on the existing literature and the eye-tracking metrics applied and evaluated in different domains, the following research questions (RQ) are proposed. RQ1: Is the amount of traffic connected to the subjective mental workload? RQ2: How does the arrangement of information influence the dwell time on the different Areas of Interest (AoI)? RQ3: Is the subjective rating of mental workload correlated with the selected eye-tracking metrics?

With regards to RQ1 we hypothesize that more traffic leads to an increase in subjective workload, similar to the model assumption of [21]. We also hypothesize that the information gathering process is changed [17], especially the distribution of dwell time onto the workplace (RQ2) by a rearrangement of information. As extension to RQ2 it is also important to see how the shift in information gathering is connected to the rearrangement, which could provide valuable insights into the general gathering of information during MRTO. With regards to RQ3 we assume that the selected metric will show a connection to the subjective measurement.

3 Method

A study including eye-tracking measurement was conducted on a prototype workplace for MRTO in a simulated real-time environment. Two groups of participants had to control three aerodromes at the same time and manage the traffic in different scenarios designed to induce workload. Eye-tracking data was collected and analyzed using the selected metrics. In the following section the details of the study are presented.

3.1 Participants

For this study two groups of participants were recruited. The first group consisted of 6 ATCOs (4 male/2 female) from HC. The first group participated in the study from 12th to the 22nd of November 2018. Their age was between 29 and 59 years (M = 42.6, SD = 9.55) and their working experience between 7 and 36 years (M = 17.7, SD = 9.6). The second group consisted of 6 ATCOs (all male) from ON. The second group participated in the study from 4th to the 11th of December 2018. Their age was between 25 and 37 years (M = 29.6, SD = 3.9) and their working experience between 1.5 and 8 years (M = 3.9, SD = 2.4). All participants in each group were active ATCOs and none had previously participated in a multiple RTO study. The ATCOs participated voluntarily during their working hours of their respective company. The study was performed in accordance with the General Data Protection Regulation (EU) 2016/679.

3.2 Design and Material

The study was conducted in the TowerLab [5] at the Institute of Flight Guidance, DLR. Figure 1 presents the workplace with the necessary screens and their names. The simulation, including traffic and realistic flight information, is provided by the traffic simulator NARSIM [31] and was supported by the planning tool (FlightStrip) for MRTO developed by Frequentis. The PV (208° horizontal and 32° vertical) was separated into three rows with one medium sized airport (Main) and two small airports (Second and Third). On the right side of each PV view were the pan-tilt-zoom (PTZ) cameras of the particular airport. Between PV view and FlightStrip, the radar screens for each airport (RADAR) were positioned. To increase the distinctiveness, the interfaces (PV, PTZ, RADAR and strip bay on FlightStrip screen) of each airport were color coded with a specific airport color. Radio communication was performed via headset and the frequencies for all airports were coupled.

Fig. 1.
figure 1

MRTO real-time simulation platform with planning tool and defined AoIs (yellow frames) for group one (Color figure online)

A between-subject design was used with the factors “arrangement of information” and “traffic amount”. To reduce learning effects, each scenario had slightly different scheduled traffic and a special event. The events were selected with regards to expected similar workload, and designed using feedback from experts from HC and ON. The traffic distribution was 50% for the main and 25% for each of the second and third airport. In two cases the event included variation in the traffic distribution.

Two events included the handover of one airport to another ATCO. The data after the handover was not used for the analysis. Each participant had to control a minimum of 4 scenarios with 3 active aerodromes in parallel. The operational modes were normal (no degraded modes or special procedures) and it was always day time at each aerodrome. None of the scenario had clouds and always visual meteorological conditions (VMC).

The participants were divided into two traffic groups with an average amount of 20 and 28 aircrafts per hour (90% traffic with instrument flight rules, IFR). The traffic groups’ names (20 and 28) are derived from the amount of traffic. The arrangement of information for traffic group 20 is presented in Fig. 1. For traffic group 28 PV_Main and PV_Third were interchanged and also were PTZ_Main and PTZ_Third.

3.3 Procedure

The participants of the two groups were scheduled each for two days and in pairs. After arrival, the participants received a briefing concerning the MRTO workplace, the procedures and their task. The participants also gave written consent to the recording of personal data. Each participant performed a training session of approximately 40 min, to familiarize themselves with the arrangement of information and especially with the FlightStrip interface. The duration of each run varied between 40 and 45 min, depending on the participant. Each participant controlled three aerodromes at the same time. Before each run, the eye-tracking glasses were calibrated to record the eye movement of the ATCO in charge. During each run the participants answered the ISA scale every five minutes. One participant was selected randomly to start with the first scenario while the other completed questionnaires. After the first run the participants switched places. This alternating procedure was repeated until each participant had finished 4 scenarios. The group 20 additionally completed a 5th emergency scenario that is not part of this paper. Each pair of participants was debriefed together.

3.4 Data Analysis

We focus our analysis on descriptive data rather than inferential statists between the groups. This is due to the small sample size and the explorative experimental design that varies two factors between the two groups The influence between the two factors cannot be distinguished and therefore the results section has to account for the influence of both. The same applies to our interpretation of the results in the Discussion and Conclusion sections.

The eye-tracking data was analyzed with the Eye-Tracking Analyzer Software by DLR [32] that uses a velocity-based fixation detection algorithm [33] to separate fixations and saccades. This was necessary to reduce the misclassification in the raw data due to the large number of AoIs and possible intertwining scan paths. The velocity threshold was defined individually as the fastest five percent movements within each run. The selected AoI are presented in Fig. 1. Eye-tracking data was only valid within these AoIs. Due to the different durations of runs, only the first 40 min of each run were used for data analysis.

The ISA scale is a 5 value scale (1 = underutilized, 2 = relaxed, 3 = comfortable, 4 = high, 5 = excessive) to indicate the level of workload for the past 5 min. As a general interpretation of the ISA scale, all values below the average can be described as underload whereas values above the average are considered as overload, but both are leading to performance decrements [34, 35]. The Yerkes-Dodson-Law [36] states that there is an inverted U relationship between arousal and performance with the general goal of moderate levels, hence their optimal performance level. The two ends of the ISA scale represent extreme values (boredom and overstrain) that are not preferable for longer periods. The optimal value is in the center of the scale. The participants received an audio signal that indicated that they had to complete the questionnaire and were instructed to do so within 30 s. The ISA was presented on the left upper side on the FlightStrip screen.

4 Results

The following sub sections are derived from the research questions. Due to technical issues, participant one from group 20 and participants one and two from group 28 were not recorded. Moreover, one run from participant two of group 20 was not recorded. Due to a relative small number of observations for the ISA category “excessive” (20 N = 10 and 28 N = 25 in relation to an average of 140 observations per ISA Value), value 5 is not further analyzed

Traffic Amount and Workload

The first analysis concerned the amount of traffic in connection to workload (RQ1). Figure 2 presents the descriptive data that supports RQ1. The analysis shows that participants in group 28 on average classify the workload as higher and closer to the optimum of “comfortably busy” than the participants in group 20. The value underutilized was selected in 15.9% (25 in group 20 and 14 in group 28) of all 245 situations. In total 2.8% (2 in 20 and 5 in 28) of the participants classified the situation as excessive.

Fig. 2.
figure 2

Average ISA score per traffic group. The number on the bottom represents amount of observations for each group. Error bars show the standard error of the mean. The dotted line indicates the scale average.

4.1 Distribution of the Areas of Interest

The second analysis concerns the influence of the arrangement of information onto the dwell time on the different AoIs (RQ2). Figure 3 shows the distribution of dwell times onto the defined AoIs (see Fig. 1). The invalid eye data of each participant was removed and the dwell times were calculated in percentage for each participant to allow comparability.

Fig. 3.
figure 3

Dwell time in percentage for each AoI per traffic group. Error bars show the standard error of the mean.

The results indicate two different scanning behaviors, with the biggest difference on the PV AoIs. The dwell times for traffic group 20 show almost the same ratio in attention distribution between PV_Main (M = 9.34, SD = 9.6), PV_Second (M = 9.77, SD = 4.63), and PV_Third (M = 11.12, SD = 3.87). By contrast, the dwell times for traffic group 28 show that the attention is distributed in connection to the traffic handled on PV_Main (M = 31.76, SD = 7.53), PV_Second (M = 5.25, SD = 3.42), and PV_Third (M = 2.68, SD = 2.05).

Table 1 Mean dwell times percentages separated by Traffic group static mask and ISA value shows the mean dwell time percentages for AoI FlightStrip and PV_Main. The results show that in traffic group 28 the subjective workload has an influence on the distribution between FlightStrip and PV_Main whereas in traffic group 20 the dwell time percentage seams stable.

Table 1. Mean dwell times percentages separated by traffic group static mask and ISA value.

4.2 Fixation Frequency and Average Duration

The third analysis concerns the eye-tracking metrics average number of fixations per minute and average duration of fixation per minute in connection to the subjective perceived workload. With respect to RQ3 the amount of fixation and average duration were selected as metrics to compare against the ISA scale. Because ISA is applied every five minutes (to evaluate the past five minutes), the time before a valid answer was separated into 5 one-minute segments. For these segments the average number of fixations was calculated. The same procedure was applied for the average duration of the fixation and in the subsection transition frequency.

Figure 4 shows the results of the analysis on average number of fixations per traffic group and ISA Value. The values indicate that on average traffic group 20 fixated more often than traffic group 28 if they were underutilized or relaxed (ISA Values 1 and 2).The same pattern is visible for high (ISA Value 4) subjective workload.

Fig. 4.
figure 4

Average number of fixations per ISA values and separated by traffic group. Error bars show the standard error of the mean.

Figure 5 complements the results from Fig. 4. As for Fig. 4, the excessive value was excluded from the analysis due to the same small number of observations. The average duration of a fixation per minute is longer for the traffic group 28 for underutilized, relaxed or high subjective workload compared to traffic group 20. For the comfortable ISA value, the average number of fixations is similar and so is the average duration of fixations. The results from Fig. 4 and Fig. 5 indicate that reduced workload leads to a higher number of shorter fixations.

Fig. 5.
figure 5

Average duration of fixations per ISA values and traffic group. Error bars show the standard error of the mean.

4.3 Transition Frequency

The final analysis extends the previous analyses by incorporation of AoIs and therefore the task of MRTO (RQ3). The task of MRTO is strongly connected to the order in which the ATCO gathers information from the workplace. By assuming that one AoI represents one possible source of information, the order in which information is gathered provides information on the strategy. Therefore, fixations are combined to macro fixations (all fixations on the same AoI).

The workload of the task should influence the macro fixations and therefore extends the search for possible eye-tracking metrics that could help, as proposed by RQ3, to determine the workload via eye-tracking metrics. The number of transitions between AoIs and the average transition duration was selected as possible metric for the following analysis. The transition and duration are measured if the fixation changes between AoIs. The results in Fig. 6 show the number of transitions per minute decrease for the traffic group 20 with increased subjective workload. Traffic group 28, has a low number of transitions while underutilized. With increased subjective workload the number of transitions seem to increase. Figure 7 presents the average duration in seconds from the last fixation on an AoI to the first fixation on a different AoI. Traffic group 20 has higher values than 28 for the subjective workload that is below comfortable.

Fig. 6.
figure 6

Average number of transitions per ISA values, separated by traffic group. Error bars show the standard error of the mean.

Fig. 7.
figure 7

Average transition duration of fixations per ISA values and separated by traffic group. Error bars show the standard error of the mean.

5 Discussion

The results need to be summed up and interpreted for each research question individually. As mentioned above, as conditions for inferential statistical analysis have not been met, the analysis was restricted to descriptive data only. Therefore the results as we found them are open to discussion and need to be interpreted in this section. The explorative means of the experiment is reflected in the expert sample, the relative small sample size, and the experimental design that varied two factors at the same time. The development of additional metrics was also not pursued due to the small sample size. Since the influence between the arrangement of information and the amount of traffic cannot be distinguished definably, this section has to take both into account.

5.1 Amount of Traffic Connected to Subjective Workload

As we expected for RQ1, the results show that the amount of traffic increases the subjective workload. For both traffic groups and therefore also for both arrangements of information the subjective workload is below the scale average which means that the participants were more often insufficiently challenged.

5.2 Arrangement of Information

RQ2 predicted an influence of the arrangement of information onto the dwell time distribution on the MRTO workplace, as it was shown in different domains already [e.g., 26, 37]. The distribution of dwell time per AoI seems to vary for each traffic group. The dwell time distribution of traffic group 20 suggests an equal monitoring of the PV view for each airport. Due to the order of the PV, the participants in traffic group 20 had to visually cross PV_Second and PV_Third while switching between PV_Main and FlightStrip. On the contrary, traffic group 28 could switch between PV_Main and FlightStrip without crossing any other PV.

Considering that the increased amount of traffic was distributed equally among the traffic groups, the dwell time distribution should be similar for each group. Even though [17] showed that workload influences the strategies for information gathering and can lead to a shift in the sources of information gathering, the sum for the PV areas is comparable for both traffic groups. This rather indicates a different distribution of dwell time on the PV and not a shift of attention to a different system, e.g. the radar, as source of information. A possible explanation could be the increased probability of aircraft movements at the same time on the main airport. This could also lead to an exponential increase of attention in traffic group 28 rather than a linear as expected.

5.3 Subjective Rating in Connection to Eye-Tracking Metric

The results in relation to RQ3 are separated into the analysis of 4 metrics that are complementary in two pairs. The first pair counts the fixation frequency and the average duration of the fixations. The second pair looks at the task related transitions between AoIs and the duration between them.

The results of the first metric pair corresponded with the arrangement of information, because the participants in traffic group 28 pay more attention to the PV_Main. Literature [e.g., 27, 28] suggest that an increase in fixation frequency and a decrease in average duration is connected to an increase in workload. The connection is similar for the transition frequency that should increase and the transition duration that should decrease with an increase in workload.

Especially at low subjective workload their number of fixations and the duration of fixation indicate that they monitor less AoIs for a longer period. Dwell time percentage for the category underutilized support this when traffic group 28 has almost 80% of its attention on only FlightStrip and PV_Main. The effect is reduced when workload increases to a medium level, assuming that this kind of workload is only reached if traffic on all three aerodromes must be handled in parallel, which then forces the ATCO to observe all PV AoIs.

Our interpretation of the results of the second metric pair is that the transition frequency per minute is increased with increasing subjective workload for traffic group 28, while the transition frequency per minute decreases for traffic group 20 until ISA Scale “comfortable” is reached. Concordantly, the average transition duration per minute is lower for traffic group 28 until the ISA scale average is reached. As the first metric pair, this indicates that during underutilized phases, traffic group 28 only concentrates on two AoIs that are close together in the workplace arrangement.

The selected metric for RQ3 seems to indicate subjective workload in a relative narrow window. This extends the work of eye metric as workload metric from [26, 27], but also draws attention to the information arrangement as an important factor that has to be considered in advance to the application of the selected eye metrics.

6 Conclusions

In summary, this explorative study aimed at the evaluation of three RQ in relation to the MRTO workplace. The study analyzed the subjective workload and the eye movement of two traffic groups with different arrangements of information. In order to induce a wide spectrum of subjective workload, the number of movements in both traffic groups was higher than generally aimed for in the concept for MRTO. The results of this study cannot be used to identify a limit of movements for one ATCO operating MRTO. The same applies for the determination that MRTO are only valid for 3 aerodromes at the same time. Safety and efficiency in ATC are dependent on a variety of factors (e.g. flight operation modes, traffic situation, weather conditions, etc.) that we did not systematically alter for the extent of this study.

A conclusion in terms of the subjective workload is a careful interpretation of the ISA scale, especially below and above the average. Underload, moderate, and overload seam to influence the eye movement quite differently and have to be evaluated stepwise.

The authors believe that the most suitable dwell time distribution for the MRTO concept should be similar to traffic group 20. Even if traffic also influences the dwell time distribution, an arrangement of the most important AoI with the less important AoIs in between seems more promising with this study. This provides indication in terms of the best workplace design. Furthermore, the study could provide valuable indications for eye metrics as objective workload measuring. In summary, the successful application of eye metrics depends on the scan path and on the arrangement of information, but seems predictable for a fixed set-up.

Future work should concentrate on an experimental design that allows for inferential static analysis of the single factors that are described in this paper. This should be followed by a structured analysis of additional factors for safe and efficient ATC and their detailed analysis in a systematic manner.