Keywords

1 Introduction

Electroencephalography (EEG), a widely used brain-computer interface (BCI) that captures electrical brain activities, allows communication of human’s intention directly from the human brain to computer system. To date, a number of brain patterns (e.g. P300 [21], motor imagery [20], error-related potential (ErrP) [17]) have been identified and utilized for BCI applications. Among these patterns, ErrP is naturally elicited when the brain observes an erroneous or unexpected behavior without explicit training or instructing participant to generate it [28]. Hence, ErrP has the advantage of facilitating natural and intuitive interaction between the brain and computer. A few pioneering work has employed ErrP in a human-machine setting. In [24], ErrP was monitored and decoded to correct robot mistakes. ErrP was also used to map human gesture to robot’s action in an interactive setting [14]. Moreover, [28] used ErrP to select the heading of the vehicle in real-world driving task.

Existing BCI literature has been demonstrating the use of BCI to communicate instantaneous/operational command (i.e. left/right, start/stop) to control an external system on various tasks. Nonetheless, operational level BCI control is laborious, time-consuming and mentally demanding. Besides, even instances of similar tasks require human to generate a similar sequence of BCI commands repeatedly. However, there is little research work extends BCI to communicate tactical or strategical level commands, which can perform a set of sequential actions and generalize to solve similar tasks. Hence, we attempt to investigate the possibility of creating set of higher level instruction, decision network or preference using ErrP based BCI.

In air traffic management literature, BCI have been applied in air traffic management to monitor air traffic controller’s (ATCO) performance. EEG was utilized to derive objective bio-marker of workload [2,3,4]. The findings are useful to ensure the situation awareness of ATCOs. Detector can then be developed to alert any sign of sleepiness or loss of vigilance. fNIRS was adopted to measure maturity and expertise of air traffic controllers [5] which will aid the training and selection process. While those work use BCI in a passive manner, our work has a very different objective which aims to use BCI to transfer preference actively.

In this work, we select one the most challenging task in air traffic control (i.e. conflict detection and resolution) to be an ErrP case study of higher level cognitive task. This task requires air traffic controller (ATCO) to maintain situation awareness of current and future air traffic condition to ensure a safe and efficient flow of every aircraft in a shared air space. Many researches and developments have been performed to develop assistance tools for ATCOs to reduce their workload whilst improving their performance. The pioneering researches relied on mathematical models of air-crafts, conflict scenarios, and airspace structure to compute conflict resolution strategy. An extensive model based approaches can be found in [15]. Recently, second-order cone programming [26], space-time Prism [9] approach, model predictive control (MPC) [13, 27], surrounding traffic analysis [22], and large scale conflict resolution models for velocity maneuver [1] and 3D conflict resolution [16]. These mathematical models can hardly scale up for a big amount of aircraft and might fail to describe the complete dynamics of air traffic. Moreover, these automated tools were not fully trusted as most models behave like a black-box to ATCOs. Besides, the advisories might be very different from ATCO’s expectation that leads to their low acceptance rate [11]. Hence, ATCOs have to remain taking active control in the management of air traffic.

While mathematical models showed their limitation in incorporating human preferences or strategies in their solution, artificial intelligence (AI) (e.g. deep learning and reinforcement learning (RL)) has achieved superhuman level in variety of strategical tasks (e.g. diagnosing a number of cancers, playing the game of Go and Atari games, etc.). Recently, the literature in behavior cloning [12] and inverse RL  [8, 18, 19, 23] have demonstrated machine abilities in mimicking the expert’s behaviors from the demonstrations or even infer the the reward function of human strategies [7, 10, 25]. Following this line of research, RL can be adopted in air traffic control task to learn how ATCOs perform the conflict resolution.

In this study, we aims to integrate the advancement of BCI and RL. Our goal is to develop an BCI framework where a human can communicate and construct a goal-oriented sequential decision-making or preference command using ErrP signature. RL can be used as an engine to incorporate human’s preference in learning model for conflict resolution. In this paper, we limit our work to fill the research gap on the empirical inquiry of error-related potential for higher level cognitive tasks (i.e. situation awareness in air traffic control tasks). In order to investigate our hypothesis that ErrP can be adopted in this air traffic monitoring task, we developed a simulated air traffic environment which can simulate different configuration of conflicts and visualize projected trajectories of traffic scenarios as well as advisory trajectory. The environment allows subject to monitor and assess the advisory naturally. We also design experimental protocol that facilitates the generation of ErrP that encapsulate ATCO’s preference. The mapping between the ErrP and preference as well as how they can be useful will be discussed in succeeding parts of this paper.

The main contribution of this work includes (1) the study of ErrP for a higher level planning task (i.e. air traffic monitoring tasks) compared to previous work on instantaneous controlling tasks; (2) the development of simulated air traffic environment to emulate real-world traffic (3) the design of visual interfaces and experimental protocol for subjects to input their preferences using ErrP; (4) experimental evidence that our proposed protocol triggered brain signature that is consistent with existing ErrP literature.

The rest of this paper is structured as follows. Section 2 describes the methodology and how the whole BCI framework is designed and implemented to achieve the experimental goal of this work. It is followed by Sect. 3 that shows the experimental results of the ErrP generated by the subject. Finally, Sect. 4 discusses how the findings and experimental evidence can be used to adapt advisory tool.

2 Methodology

This is an exploratory study to investigate the extension of ErrP for higher level cognitive task. We implemented the BCI framework and design experimental protocol to validate the applicability of ErrP for a real-world problem in air traffic management. The experimental setup and paradigm are illustrated in Fig. 1.

Fig. 1.
figure 1

The experimental setup and diagram of our BCI framework.

2.1 Simulated Environment and Scenario Generation

The simulated environment configures conflict scenarios with two air-craft as shown in Fig. 2. For simplicity, the airspace was assumed to a circular area. In the airspace presented to the subject, there is an ownship and an intruder together with their projected trajectories. We restrict the two air-crafts to fly as the same lateral speed in the environment. The conflict configuration can be characterized by the conflict angle and closest point of approach. Advisory tool can propose a heading change maneuver to resolve the conflict. An example is shown in Fig. 2.

Fig. 2.
figure 2

An example of conflict configuration for two air-crafts. The dashed green line represents a random heading change maneuver. (Color figure online)

2.2 Experimental Protocol

The subject was seated in front of a screen displaying a variety of air traffic scenarios. The time-line of the experiment can be found in Fig. 3. A trial starts with a little dot in the middle of the screen with a audio cue. The color of the dot is hint of the quality of the solution showing next (green indicates good advisory while red indicates bad advisory). Subsequently, an air traffic scenario with resolution advisory is presented to the subject. The ownship is presented using black line while the intruder using gray line. On the ownship trajectory, there are 3 little circles. Subject monitor and assess the advisory trajectory of the ownship for 5 s. Next, the middle circle changes from unfilled to filled circle to cue the subject to get ready for the next visual event. The middle filled circle might either make a positional change in the same direction as the advisory trajectory or the opposite direction. Subject was told that the same direction signifies acceptance of the advisory and the opposite direction indicate rejection of the advisory. However, the direction is assigned randomly by design. We assume ErrP to be triggered when the assigned direction does not match with what the subject expects.

Fig. 3.
figure 3

Time-line of our experimental design. (Color figure online)

2.3 EEG Acquisition and Processing

When the subject was performing the task, EEG was acquired with a sampling frequency of 250 Hz using BrainAmp MR plus EEG device. We custom-made a circuit for EEG event marker using NI USB-6001. It facilitates our simulated environment to send triggers to the Brain Vision Recorder. In this study, we select and analyze the channel C2. Notch filter was applied to remove the line noise. The signal was then band-pass filtered between 1 and 5 Hz. During the experiment, subject was instructed to prevent muscle movement that can induce artifact to the signal. The recorded signal is subsequently post-check to reject artifact trials. EEGLAB was used to extract the signal segments 0.5 s before and 1.5 s after the positional change on of the circle on the ownship trajectory.

3 Results

The event related potential of the experiment was shown in Fig. 4. Trials with positional changes that match subject’s expectation are shown in Fig. 4 (left) and trials with positional changes that do not match subject’s expectation are shown in Fig. 4 (right). By comparing the two figures, consistent event related potentials were observed for trials with positional changes that do not match subject’s expectation. The difference between ErrP and Non ErrP was shown in Fig. 5. The pattern of the potential is similar to the ErrP reported in [6], a negative deflection followed by positive peak. Besides, we also observed longer ErrP latency compared to [6].

Fig. 4.
figure 4

Visualization of event related potential. (Left) Non ErrP. (Right) ErrP.

4 Discussion

The result of the experiment demonstrated that ErrP can be extended to conflict resolution in air traffic control, a higher level cognitive task. Based on the ErrP finding, we obtained experimental evidence that human preference can be encapsulated into the ErrP signature. Decoding of this signature can be used to adapt an intelligent agent to behave in the way human will trust. While the resolution advisory was hand designed in this experiment, it is possible implement the advisory model using inverse RL, where it can be proposing random maneuver initially and gaining air traffic control skill by iterative interaction with ATCOs.

In the succeeding parts of the paper, we will discuss how the ErrP obtained from our protocol can be used to adapt the advisory tool.

Table 1. Mapping between ErrP and preference.

4.1 Mapping Between ErrP and Preference

As ErrP is supposed to be triggered when the outcome of the positional change does not match the expectation. Hence, the expected positional change can be derived using the existence of ErrP and the outcome of the assigned positional change. The mapping between ErrP and preference (accept or reject) can be found in the Table 1.

4.2 ErrP as Building Block for Reinforcement Learning

The mapping obtained from Subsect. 4.1 can be used as a reinforcement signal to adapt advisory tool. One of the straightforward approach is to adopt reinforcement learning and define the reward function of reinforcement learning using ATCO’s acceptability.

Fig. 5.
figure 5

The difference between ErrP and Non ErrP.

Let define the reward function \(\mathcal {R}_0\) as follows:

$$\begin{aligned} \mathcal {R}_0(resolution) = {\left\{ \begin{array}{ll} 1 &{} Preference = Accept \\ -1 &{} Preference = Reject \\ 0 &{} Preference = Not\, Available \end{array}\right. } \end{aligned}$$
(1)

The reward function \(\mathcal {R}_0\) can be used in addition to the environmental reward function \(\mathcal {R}_1\) which access the quality of the resolution based on common criteria such as deviation, travel distance, travel time or fuel consumption. These rewards can be combined by weighted sum.

$$\begin{aligned} \mathcal {R}(resolution) = \omega _0\mathcal {R}_0(resolution) + \omega _1\mathcal {R}_1(resolution) \end{aligned}$$
(2)

4.3 Possible Future Work

As the focus of this work is to study ErrP for higher level cognitive task, several restriction was made to control and simplify the simulated environment. Relaxing the restriction and assumption part by part is important to improve the practicality of this work for solving real-world air traffic problem.

While the framework is designed for air traffic task, it is also applicable to amyotrophic lateral sclerosis (ALS) patients where they can construct their goal-oriented sequential decision making command.