Keywords

1 Introduction

For commercial aircraft, airworthiness is certification and supervision on the design, manufacture, implementation and maintenance of the aircraft according to the airworthiness regulations and materials on behalf of public [1]. The aim of airworthiness is to ensure the aircraft could achieve the safety level that the regulations required. Typically, the design of commercial aircraft should comply with Certification Specifications for Large Aeroplanes CS-25, which is issued by European Aviation Safety Agency [2].

Human factors is the most important factors that could threaten aviation safety. According to the statistics, over 70% flight accidents were attributed to human factors [3]. There are several airworthiness regulations that concerning human factors in CS-25. Among them, CS25.1523-Minimum Flight Crew, is one of the most important regulations which stipulates the determination of the number of flight crew should base on the workload on individual crew members. In other words, in order to show the compliance with CS25.1523, the workload of each flight crew member should be measured. Furthermore, the recommended means of compliance includes simulator test and flight test.

Typically, the traditional workload measurements for flight crew consist of four types: timeline analysis, task performance measures, subjective rating scale measures and psychophysiological measures [4]. Timeline analysis could be used as an analytic tool in order to make a priori predictions regarding the task demands imposed on the crew [5]. It based on micro-motion techniques and borrowed from industrial engineering, computes workload as a ratio of time required to complete necessary tasks as a fraction of time available. In several aircraft types design, Boeing Commercial Airplane used timeline analysis technique in simulator studies [6]. Task performance measures can be classified into two major types: primary task measures and secondary task measures [7]. Normally, performance of the primary task will always be of interest as its generalization be central to the study. Speed, accuracy, response times, and error rates are often used to assess primary task performance [8]. Bliss and Dunn supported the hypotheses that increasing primary task and alarm task workload degraded alarm response performance [9]. The secondary task technique assumes that operators are given an additional information processing task to perform in conjunction with the task of interest. The rationale underlying the use of secondary tasks is that by applying an extra load which produces a total information processing demand that exceeds the operator’s capacity, workload can be measured by observing the difference between single task and dual task performances [10]. Wester et al. examined the impact of secondary task performance, an auditory oddball task, on a primary lane keeping driving task [11]. By studying the impact of simultaneous information conflicts, from multiple secondary in-vehicle tasks, on the primary task of driving, Lansdown and Brook-Carter suggested overloading the visual channel would result in performance decrements [12]. Subjective rating scale measures assume that an increased power expense is linked to the perceived effort and can be appropriately assessed by individuals. NASA-TLX, Bedford scale, and Modified Cooper-Harper scale are most popular ones. Schnell et al. evaluated Synthetic vision information systems in flight deck by using NASA-TLX [13]. The pilot workload, which was assessed through Bedford scale, resulting from a range of wind-over-deck conditions have been used to develop the Ship-Helicopter Operating Limits for a Lynx-like helicopter and the SFS2 [14]. Physiological measures use the physical reactions of the body to objectively measure the amount of mental work a person is experiencing. It would seem an objective measurement would be the most exact and therefore the best way to find workload because it does not require a direct response from the person, unlike subjective measures [15]. In physiological areas, eye activity and cardiac activity are the most research focuses on. Heart rate measurement is considered the most common and reliable measure of workload. Generally, heart rate increases as workload increases [16]. Moreover, eye activity, including pupil dimension and eye blink rate could also indicate the workload. Normally, pupil diameter is found to increase with increasing mental workload, and eye blinks rate decrease with increasing workload [17].

Since flight test may include high or medium risk scenarios, it is necessary to select the appropriate workload measurement which would not interfere with flight crew operation. Therefore, in order to determining the desirable workload measurement in simulator and flight test, subjective rating scale measures and physiological measures, including NASA-TLX and eye blinks rate, were analyzed in this study. Furthermore, 9 pilots composed 6 flight crews were participated in this test which contained three flight scenarios: Standard Instrument Departure (SID), Manual Departure (MD), and Standard Instrument Approach (SIA).

2 Method

2.1 Subjects

Nine Chinese male pilots ranging in age from 30 to 50 (Mean = 41.3 ± 5.23) were invited to participate in this experiment. These pilots were either commercial airline pilots or flight instructors from China Eastern Airlines. Simultaneously, they had all been recruited as captains or co-captains for some types of aircrafts (5 for B737, 4 for B747). Furthermore, these pilots were paired into six flight crews. Among them, three pilots were assigned with different flight responsibilities in different crews involved, i.e., as Pilot Flying in one crew and as Pilot Monitoring in the other. Before the experiment, all subjects signed the consent form, which was approved by the Institutional Review Board of Shanghai Jiao Tong University.

2.2 Apparatus

The experiment was carried out in a CRJ-200 full - flight simulator. It is a qualified flight simulator (level D). All the configurations in the flight simulator are identical with the real aircraft. Simultaneously, the flight test was conducted in a real CRJ-200 aircraft, shown as in Fig. 1.

Fig. 1.
figure 1

Flight deck of CRJ-200

Besides the flight simulator and the aircraft, a head-mounted eye tracker (Tobbi Glass, Sweden), which sample rate was 30 Hz, was used to determine the eye blinks rate of the subjects during the experiment, shown as in Fig. 2.

Fig. 2.
figure 2

Tobbi Glass

2.3 Procedure

In order to compare the workload measurements in flight simulator and flight test, three flight scenarios were designed, including Standard Instrument Departure (SID), Manual Departure (MD), and Standard Instrument Approach (SIA). Each of the flight scenarios were carried out in flight simulator and flight test respectively by each flight crew. The configurations and operating procedures for the flight scenarios were same in flight simulator and flight test as following.

  1. 1.

    Standard Instrument Departure

The flight scenario was conducted in Chengdu Shangliu International Airport. The task was started from pressing “TOGA (Takeoff/Go-around)” button by pilots. Then, the subjects pushed the throttle and kept accelerating. When the aircraft reaching the speed of VR, the subjects needed to rotate and maintained a 3 degree climbing approximately. When the aircraft reaching the altitude of 1500 feet, the subjects were required to connect the autopilot system, and keep supervising the essential flight parameters until climbing to 10000 feet.

  1. 2.

    Manual Departure

The flight scenario was conducted in Chengdu Shangliu International Airport. The task was started from pressing “TOGA” button by pilots. Then, the subjects pushed the throttle and kept accelerating. When the aircraft reaching the speed of VR, the subjects needed to rotate and maintained a 3 degree climbing approximately. Moreover, when supervising the positive rising rate on Primary Flight Display, the subjects were required to retract the landing gear and keep climbing to 10000 feet by hand.

  1. 3.

    Standard Instrument Approach

The flight scenario was conducted in Chengdu Shangliu International Airport. The task was started in 40 nautical miles away from descending point. After slowing down to 145 knots, and descending to 1500 feet, the aircraft was in landing pattern. The subjects executed a CAT I standard instrument approach procedure and landed on the runway.

The simulation experiment was conducted prior to the flight test. At first time, the subjects performed a standard instrument departure and a standard instrument approach. At the second time, they performed a manual departure and a standard instrument approach. After each task, every subject was asked to fulfill the NASA-TLX scale. In flight test, the procedures were same as in flight simulator.

2.4 Statistical Analysis

SPSS 17.0 for Windows was used to process the experiment data, and ANOVA analysis, and correlation analysis were implemented in this study. When P < 0.05, the results were considered statistically significant.

3 Results

3.1 NASA-TLX Scales

Considering the results of NASA-TLX scales, the three flight scenarios showed the significant differences in the simulator experiment (F(2,12) = 3.01, p = 0.040). Among them, Standard Instrument Approach (SIA) had the maximum average NASA-TLX scores (Mean = 27.92, SD = 9.54), Standard Instrument Departure (SID) was minimum (Mean = 19.85, SD = 5.08), Manual Departure (MD) was in the middle (Mean = 22.58, SD = 7.32). Similarly, in flight test, standard instrument approach had the highest NASA-TLX scores (Mean = 33.42, SD = 10.24), manual departure was medium (Mean = 28.75, SD = 7.06), and standard instrument departure was minimum (Mean = 25.42, SD = 9.00). However, the differences of three flight scenarios in flight test were insignificant (F(2,12) = 3.01, p = 0.063). Furthermore, the difference between simulator experiment and flight test were significant in standard instrument departure (t = 2.43, p = 0.024) and in manual departure (t = 2.10, p = 0.047). Nevertheless, in standard instrument approach, the difference was insignificant (t = 1.36, p = 0.187). Otherwise, NASA-TLX scales showed a moderate correlation between simulator and flight test (R = 0.524, p = 0.001), as was depicted in Fig. 3.

Fig. 3.
figure 3

The linear regression results of the NASA-TLX scales in simulator and in flight test

3.2 Eye Blinks Rate

Considering the results of eye blinks rate, as shown in Fig. 4, only in the simulator experiment (F(2,12) = 4.711, p = 0.016), the differences of the three flight scenarios was significant, and in the flight test (F(2,12) = 0.003, p = 0.997), the differences was insignificant. In the simulator experiment, standard instrument departure had the maximum average eye blinks rate (Mean = 14.08, SD = 3.63), standard instrument approach was minimum (Mean = 9.83, SD = 2.72), and manual departure was medium (Mean = 11.58, SD = 3.78). However, in the flight test, the discrepancy is slight. Furthermore, comparing the difference between simulator experiment and flight test for each flight scenarios respectively, only standard instrument departure was significant (t = 3.331, p < 0.01), and both manual departure (t = 1.457, p = 0.159) and standard instrument approach (t = 0.213, p = 0.834) were insignificant. Besides, eye blinks rate expressed a more weak correlation between simulator and flight test (R = 0.242, p = 0.155).

Fig. 4.
figure 4

The results of Eye blinks rate for the three flight scenarios, which were standard instrument departure (SID), manual departure (MD) and standard instrument approach (SIA), in the simulator and in the flight test. The error bars stand for the difference of eye blinks rate of the subjects either in simulator of in flight test.

4 Discussion

Flight test is the most direct means of compliance in aircraft human factors airworthiness certification. However, it is not the preferred means due to the following three reasons. Firstly, it might not be appropriate to test an abnormal situation for safety consideration [18]. Secondly, a flight environment is normally difficult to manipulate the operational environment which might be required to apply the scenario-based approach. Last but not least, human factors scenarios performed in flight test could not be easy to duplicate due to the lack of controllability of the operation context [19]. Therefore, simulator test might be more appropriate than flight test, especially in high risk flight scenarios, and both of them should be examined from the standpoint of human workload to shown compliance with airworthiness requirement.

However, the classic workload measurements have their own limitations. Subjective rating scale measures are sometimes uncertain on the repeatability and validity, and data manipulations are often questioned as being inappropriate [20]. Moreover, subjective feeling of workload was essentially dependent on the time stress involved in performing the task for time-stressed tasks only [21]. For task performance method, because of the compensatory effect of increased effort, it is clear t not sufficient to assess the state of the operator [22], and some other factors, such as strategy, affect performance and workload differently [23]. Psychophysiological measures are influenced by ambient environment and task duration [24]. In real flight, most of pilots are preferred to wear a sunglass to prevent direct sunlight. Moreover, some studies assumed that eye movement activity parameters only can provide a sensitive measure of visual workload [25]. Therefore, it is necessary to select the desirable workload measurements according to the specific characteristics of simulator test and flight test.

In the simulator experiment, NASA-TLX is a multidimensional rating scale that assesses a subject’s subjective workload on six 100-point scales related to a different aspect of workload: Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration [26]. It is more precise and comprehensive in workload evaluation. Besides, Eye measures were sensitive to intermediate levels of mental effort as well [27], and would also produce reliable near-real-time indicators of workload in flight simulator [28].

In this study, two types of workload measurements were compared, including subjective methods: NASA-TLX, and psychophysiological measures: eye blinks rate both in flight simulator and in flight test in three flight scenarios. The results demonstrated that NASA-TLX, eye blinks rate were credible in flight simulator. Nevertheless, in these three flight scenarios, neither of them produced reliable indictors in flight test. In further study, there are two more aspect would be carried out. Firstly, more measures would be implemented in both simulator and flight test environment, for instance subjective measurements including Bedford methods and Modified Cooper-Harper, and psychophysiological approaches like ECG and EEG. Secondly, in order to ensure the safety of flight, only normal flight scenarios were selected in this study. Therefore, under safe condition, more scenarios should be included, especially some abnormal conditions, such as, crosswind handling, one engine failure.