Keywords

1 Introduction

High demands on cognitive capacity and the ability to cope with workload are increasingly imposed on employees due to advanced information and communication technology, highly interactive work environments, and work assistance systems. Although the main goal of automatization is to simplify work, employees increasingly complain about high mental workload and stress. Problems arise from information overload, frequent work interruptions or from a multitude of irrelevant information [5, 6, 9]. Simultaneously, automation and supervisory control tasks can be linked to monotonous work that reduce employees’ arousal [2, 3, 8, 10, 11]. Hence, the long-term negative consequences of inappropriate workload on individual’s health constitute a serious problem of our modern society. Furthermore, increased error rates due to inappropriate workload constitute a safety risk for other persons [12].

An objective method for continuous mental workload registration is therefore absolutely essential. Efficient execution of work tasks is only possible in an optimal workload range, which can be effectively measured where information processing takes place, i.e. the brain. Neuronal workload measurement is therefore a key technology for optimizing work conditions in human-machine systems.

On the basis of monitoring the neuronal brain state it is possible for instance to define optimal task sharing between human and machine with efficient cognitive processing for the operator. The benefits for employees are maintenance of autonomy and working ability resulting from the moderate levels of mental workload when working with a human-machine system. Another important benefit is the prevention of negative impacts due to sustained overload or underload on the mental health and cognitive capacity of the working population.

This article describes the development of a continuous method for neuronal mental workload registration during the execution of cognitive tasks.

2 Methods

Cognitive tasks were conducted in a laboratory setting with the long-term goal of implementing a system capable of continuously monitoring an operator’s mental workload and recognizing critical states (e.g. high load and low load). The tests took place in the shielded lab of the Federal Institute of Occupational Safety and Health in Berlin.

During the execution of the tasks, which had diverse levels of complexity and difficulty, we registered the electroencephalogram (EEG), as well as further workload relevant biosignal data (e.g. heart rate, blood pressure). The NASA-TLX was conducted as subjective questionnaire method [4]. Based on the implementation of a MATLAB toolbox consisting of modules for EEG pre-processing, segmentation, analysis and template generation, mental workload can be indexed according to Lei’s Logistic Function Model (LFM) and workload can be individually classified in the ranges of low, moderate and high load [7].

2.1 Procedure

The experiment was carried out with each subject fully in a single day. It consisted of two parts: a training phase and the main experiment. During the training phase subjects were familiarized with the cognitive tasks. The cognitive tasks were the same as those of the main experiment but shorter in time. They were repeated until the subject reached an accuracy index of at least 80 %. The training phase was to create similar individual starting conditions in respect to the performance, so that we could investigate the workload’s effect independent from learning effects.

The main experiment started after a short break subsequent to the training phase. The tasks were presented in the same counterbalanced order as presented during the training phase. At the beginning and at the end of the main experiment biosignal rest measurements of about 3 min took place. The experiments were controlled remotely through a remote desktop connection, an intercommunication system and a video monitoring system.

2.2 Subjects

The sample consists of 57 people in paid work and shows high variability in respect to the cognitive capacity and hence to the experienced mental workload. Three persons did not complete all cognitive tasks and had to be excluded from further analysis. Table 1 describes the sample set of 54 people used.

Table 1. Sample set

2.3 Tasks

The simulation of different cognitive task requirements is realized through the implementation of a task battery in the E-Prime application suite. The battery consists of tasks with diverse complexity and difficulty inducing different levels of mental workload. The implemented tasks are listed in Table 2.

Table 2. Task battery

In this paper we concentrate on the analysis and evaluation of four tasks: switch-PAR and switch-NUM as the easiest ones, switch-XXX as a switching task with working memory requirements and moderate workload [1], and AOSPAN as a demanding dual task (see Figs. 1, 2, 3, 4). The latter is a translated version of the AOSPAN task developed by [13]. The analysis of the rest measurements serves as a reference point measurement.

Fig. 1.
figure 1

Switch-PAR as single task: The cue PAR will appear on the screen followed by a number. Please judge the number by its parity (i.e., whether it is even or odd): Press the RED mouse button if the number is even, Press the GREEN mouse button if the number is odd.

Fig. 2.
figure 2

Switch-NUM as single task: The cue NUM will appear on the screen followed by a number. Please judge the number numerically: Press the RED mouse button if the number is greater than 5, Press the GREEN mouse button if the number is less than 5.

Fig. 3.
figure 3

Switch-XXX as a switching task with working memory requirements: This time, the sequence of the task cues is fixed: NUM, NUM, PAR, PAR, NUM, NUM, and so on Remember this sequence because now XXX will appear in place of the cue. To review, NUM: RED button = number greater 5, GREEN button = number less 5; PAR: RED button = even number, GREEN button = odd number. If you lose your rhythm, a cue will appear twice.

Fig. 4.
figure 4

AOSPAN as dual task: memorize a set of letters in the order presented while simultaneously solving math problems. Trials consist of 3 sets of each set size, with the set sizes ranging from 3-7.

2.4 Subjective Ratings

Subjective workload was captured with a computerized version of the NASA-TLX. After each task during the training phase, subjects were asked to rate the workload sources in 15 pairwise comparisons of NASA-TLX’s six workload dimensions: mental demand, physical demand, temporal demand, performance, effort, frustration. This required the subject to choose which dimension is more relevant to workload in the specific task. Hence, we gained an individual weighting of these subscales based on their perceived importance.

After each task during the main experiment, subjects were asked again to rate the task within a 100-point range with 5-point steps. They indicated their rating by clicking on a 5-point step box with an optical mouse.

2.5 Physiological Measures

The electroencephalogram as well as the blood pressure (BP) and the heart rate (HR) were digitally recorded only during the main task.

EEG. The EEG was captured by 25 electrodes placed at positions according to the 10-20-system and recorded with reference to Cz and at a sample rate of 500 Hz. For signal recording we used an amplifier from BrainProducts GmbH and their BrainRecorder software.

The recorded EEG signal is widowed with a Hamming function and filtered with a bandpass filter (order 100) between 0.5 and 40 Hz. Subsequently, independent component analysis (ICA) is applied to the signal and the calculated independent components are visually inspected and classified as either an artifact or signal component. The signal components are projected back onto the scalp channels. The artifact-corrected EEG signal is transformed to average reference and cut into segments of 10 s length, overlapping by 5 s. Subsequently, the workload relevant frequency bands (\(\theta \): 4-8 Hz, \(\alpha \): 8-12 Hz) are computed over the segments using the Fast Fourier Transformation (FFT).

Individual system training for each person is done on the basis of the \(\theta \)- and \(\alpha \)-band power distributions over the segments of the first minute of each task. The mean values computed for each person, task and frequency band are stored. Next, the cumulative distribution function over all training segments of each person is built and the previously stored mean values are used to extract the corresponding p-values from the cumulative distribution function. These p-values are averaged over all persons per task and a task specific overall p-value is gained for each frequency band. These overall p-values are then used for extracting the individual task specific q-values from the cumulative distribution functions of each person. Finally, the individual q-values and the NASA-TLX ratings are used for the individual parametrization of the system.

Hence, after successful system training and generation of individual parameters for \(b_0\), \(b_1\), and \(b_2\) we get a personalized Logistic Function Model (LFM) [7] for each person:

$$\begin{aligned} W = \frac{1}{1 + e^{-\left( b_0 + b_1 \cdot \theta + b_2 \cdot \alpha \right) } } \end{aligned}$$
(1)

Here, the relative frequency values (\(\theta \), \(\alpha \)) can be applied and a workload index W for each segment calculated. Due to the nature of the logistic function, this workload index is in the range of 0 to 1. Segments with a workload index \(W \le 0.2\) are classified as low load segments, with \(0.2 < W < 0.8\) as moderate load segments, and with \(W \ge 0.8\) as high load segments. Hence, we obtain for each person and task three percentage values for the portion of the segments of each sector (LLS: low load segments, MLS: moderate load segments, HLS: high load segments).

Cardiovascular Parameters. Blood pressure was recorded continuously by the FMS Finometer Pro device. A finger cuff was placed around the subject’s finger and systolic and diastolic blood pressure as well as the heart rate were detected automatically. The recorded data was processed in the time domain.

2.6 Performance

We concentrated on the analysis of the individual accuracy rates for all four tasks. For AOSPAN, correct responses include the number of sets in which the letters are recalled in correct serial order and correct math problem solving.

2.7 Statistical Analysis

Six ANOVAs were carried out utilizing repeated measures design, one within-subject factor (\(\theta \), \(\alpha \), systolic BP, HR, accuracy rate or NASA-TLX), six levels (the four tasks and the two rest measurements) for the factors \(\theta \), \(\alpha \), systolic BP, and HR, or four levels (the four tasks) for the factors accuracy rates and NASA-TLX. Differences between the levels were examined and tested with a post-hoc test (Bonferroni).

3 Initial Results

The results computed over 54 subjects, the four tasks, and the rest measurements will be presented in the following section. They comprise the obtained subjective ratings and task performance as well as the mental workload indexed segments from the EEG, the systolic BP and HR.

3.1 Subjective Ratings and Performance

Subjective Ratings. Figure 5(a) shows the average workload index for the selected tasks switch-PAR, switch-NUM, switch-XXX, and AOSPAN as representatives of two low, a moderate and a high workload tasks. Workload means changed significantly during the experiment (Greenhouse-Geisser F(5.96; 316.01) = 65.023, p\(<\)0.001). Post-hoc analysis revealed significant changes of the subjectively rated mean workload index between the tasks apart from the two easy tasks among each other.

Furthermore, the analysis of the NASA-TLX sub-scales indicates the predominant role of mental demands at the implemented task battery. Hence, the induced workload originates from information processing and should be reflected in the EEG.

Performance. Figure 5(b) shows the average accuracy rates for the selected tasks switch-PAR, switch-NUM, switch-XXX, and AOSPAN. Accuracy rate means changed significantly during the experiment (Greenhouse-Geisser F(3.71; 196.67) = 173.256, p\(<\)0.001). Post-hoc analysis revealed significant changes of the mean accuracy rates between all tasks.

Fig. 5.
figure 5

(a) NASA-TLX computed for switch-PAR, switch-NUM, switch-XXX, and AOSPAN over 54 subjects. (b) Accuracy rates computed for switch-PAR, switch-NUM, switch-XXX, and AOSPAN over 54 subjects.

3.2 Physiological Measures

EEG. Analysis of the classified EEG segments demonstrates a proportion increase of the high load segments and a proportion decrease of the low load segments with increasing task difficulty level. Means of LLS and HLS changed significantly during the experiment (Greenhouse-Geisser F(6.47; 0.28) = 20.89, p\(<\)0.001; Greenhouse-Geisser F(5.36; 289.16) = 23.24, p = 0.001). Results obtained from the assessment of the EEG segments are presented in Fig. 6.

Post-hoc analysis of the proportion of HLS showed that the means were significantly larger during the AOSPAN task than all other measurements. Significant differences were identified also between the switch-XXX task and the switch-NUM task as well as between the switch-XXX and the rest measurement at the end. The later showed significant changes to the switch-PAR task, too.

The proportion of LLS revealed significant changes between AOSPAN and all other measurements. Similar behavior was observed for the switch-XXX task. Furthermore, LLS’s proportion of the rest measurement at the end was significantly larger then switch-PAR and the rest measurement at the beginning. No significant differences could be found between the easiest tasks switch-PAR and switch-NUM, neither among themselves nor to the rest measurement at the beginning.

Cardiovascular Parameters. Both systolic BP and HR differed between the measurements significantly (Greenhouse-Geisser F(4.45; 235.65) = 17.62, p\(<\)0.001; Greenhouse-Geisser F(5.89; 312.26) = 20.92, p\(<\)0.01).

HR during the rest measurement at the end was, according to post-hoc analysis, lower than during all four tasks. HR during the rest measurement at the beginning was significantly lower then switch-NUM, switch-XXX, and AOSPAN. Furthermore, significant changes in HR could be found between the tasks except for switch-XXX and AOSPAN.

Systolic BP means were significantly larger during the AOSPAN task than in switch-PAR, switch-NUM, and the rest measurements. Additionally, they were significantly larger during switch-XXX than in the two easier switch tasks and the rest measurements. No significant changes could be found between the easy switch tasks switch-PAR and switch-NUM. Furthermore, there were no significant changes between the rest measurements at the beginning and at the end, the rest measurement at the end and the two easier switch tasks, the rest measurement at the beginning and switch-PAR.

Results of systolic BP and HR are presented in Fig. 7(a) and (b).

Fig. 6.
figure 6

EEG - proportion of LLS (a) and HLS (b) computed for switch-PAR, switch-NUM, switch-XXX, and AOSPAN over 54 subjects.

Fig. 7.
figure 7

Systolic BP (a) and HR (b) computed for switch-PAR, switch-NUM, switch-XXX, and AOSPAN over 54 subjects.

4 Discussion

The registration of mental workload by means of the EEG is the central issue addressed by this paper. We induced different levels of mental workload on the basis of a task battery but for the sake of convenience, we concentrated here on the switch-PAR, switch-NUM, switch-XXX and AOSPAN tasks. Cognitive requirements of the first two tasks are quite low and the tasks can be assumed to be representative of an easy task. The switch-XXX task is more demanding due to higher requirements on the working memory and rule switching. It can be classified as a moderate to difficult task but not as challenging as the AOSPAN task. The AOSPAN task demands memory control while dealing with distraction due to the math problem solving. It is a dual-task with high workload requirements.

Subjective ratings derived from the NASA-TLX questionnaire demonstrate significant workload differences between the more demanding tasks (switch-XXX and AOSPAN) and the easy tasks. No significant difference could be identified among the subjects in respect of their experienced workload between the two easy tasks switch-PAR and switch-NUM. Accuracy rates show significant differences between all tasks but remarkably larger breaks between the difficult AOSPAN task and all others but also between the moderate task and the two easy tasks. Although there is a significant difference between switch-PAR and switch-NUM, it is pretty clear that the two tasks are located in the low workload level compared to the other two tasks. However, we notice the switch-PAR task to be slightly more difficult than the switch-NUM task.

Cardiovascular parameter indicate significant differences between the more demanding tasks (switch-XXX and AOSPAN) and the two easy tasks. They also show significant differences between both demanding tasks and the rest measurements at the beginning and the end of the experiment. What is more, HR indicates small differences between both easy tasks and also between the rest measurement at the end and all other tasks. Surprisingly, no significant difference can be observed between the more difficult tasks. Here we have to ask, if the cardiovascular parameters are not able to finely distinguish between more demanding tasks, maybe due to a ceiling effect.

The EEG as a direct signal of brain activity and the frequently observed variability of the \(\theta \)- and \(\alpha \)-band according to attention, fatigue and mental workload, constitute the theoretical background for the implementation of the LFM method for neuronal mental state monitoring. Proportion analysis results of the HLS and LLS are in concordance with the expected results due to difficulty levels resulting from the requirements of the tasks on the executive functions. The moderate switch-XXX task contains significantly less LLS than the easy tasks and the rest measurements. The more demanding AOSPAN task includes even less LLS. This differences between AOSPAN and all other conducted measurements were found to be significant.

In respect of the HLS, the AOSPAN task again shows substantially higher values than all other measurements. Considering also its small proportion of LLS, AOSPAN is a high mental workload task. Switch-XXX includes significantly higher proportions of HLS than switch-NUM and the rest measurement at the end. However, no significant differences in respect of the HLS could be found between it and the switch-PAR as well as the rest measurement at the beginning. As a side note, this fits well to the assumption that the switch-PAR task is a bit more demanding then the switch-NUM task. If one additionally considers switch-XXX’s proportion of LLS, we can assume that it ranges between the difficult and the easy task. Hence, it can be considered as a moderate workload task.

The easy tasks indicate no significant differences to the rest measurement at the beginning. Neither in respect to their proportion of LLS nor to their proportion of HLS. Interestingly, there is a significant difference of HLS’s as well as LLS’s proportion from the rest measurement at the end, but only for switch-PAR. This fact is solidly in line with the accuracy rates indicating the switch-PAR task as slightly more difficult than the switch-NUM task.

To sum up, our study concurs with the expectations for an increase in the HLS and a decrease in the LLS. Based on these findings of neuronal brain states an optimal task sharing between human and machine could be defined and a moderate mental workload could be achieved. The prevention of negative impacts due to sustained over- or underload on the mental health and cognitive capacity of the working population would be the next step to take. To accomplish this, a consolidated study of over- and underload conditions has to be conducted and measured by means of continuous mental workload registration.

Finally, brain state monitoring can contribute to the modulation of workload, protect and advise against overload and underload, and can be used for ergonomic evaluation and improvement of human-machine systems and information intensive occupations.