1 Introduction

A Brain Computer Interface (BCI) is a system that measures Central Nervous System (CNS) activity and converts it into artificial output such that it can replace or improve natural CNS output [1]. In other words, BCI is a neuro-imaging system which helps in mapping of neural signal of an individual to his/her cognitive state. Signal acquisition, feature extraction, and signal classification are the basic building blocks of a BCI system. Motor imagery signal classification plays a key role for a BCI system to work properly.

BCI can be categorized into two main categories namely invasive BCI system and non-invasive BCI system with respect to different signal acquisition methods. Electrodes are implanted inside the brain to acquire brain signals in invasive BCI system while in non-invasive BCI system, brain signals are recorded from the scalp itself. EEG is the most popular non-invasive method for acquiring brain signals as it is economical, portable, reliable and has excellent temporal resolutionFootnote 1.

Brain waves recorded via EEG are categorized in 5 major frequency bands as follows : delta waves(0.5 Hz–3 Hz), theta waves(4 Hz–7 Hz), alpha waves(8 Hz–13 Hz), beta waves(14 Hz–30 Hz) and gamma waves(\(>30\,\text {Hz}\)) [2]. And according to various studies it is found that neural signal while performing motor imagery mostly consists of alpha and beta waves. So feature extraction in frequency domain prior to classification becomes an important preprocessing step for motor imagery signal classification.

Common Spatial Filter (CSP) [3, 4] is one of the most common feature extraction methods which is used in BCI field for extracting features from motor imagery data. Winner of BCI IV 2008 competition used a variant of CSP to attain the state-of-the-art results. Other than CSP, feature extraction methods like Principal Component Analysis(PCA), Independent Component Analysis (ICA) [5] etc. are also being used. Earlier, conventional machine learning methods were mostly employed for motor imagery signal classification. Conventional machine learning algorithms like Support Vector Machine(SVM), Bayesian Classifier, Nearest neighbour classifier, Linear Discriminant Analysis (LDA) etc. were used for motor imagery signal classification [6,7,8]. With the recent development in deep learning algorithms, those are being applied for solving motor imagery classification problems also. Deep learning algorithms like Convolutional Neural Network(CNN) [9,10,11], Recurrent Neural Networks(RNN) [12], Restricted Boltzman Machine(RBM) [13] etc. are being used in recent studies for motor imagery signal classification. We have proposed a yet another deep learning technique which is a multi-task learning based approach.

A multi-task model is preferred over single task model architecture when two or more tasks are correlated to each other. It leverages correlation among multiple tasks and learns better features from the shared representation as compared to single task model. Multi-task learning has been proved effective in various domains. In the field of natural language processing, learning similar tasks like emotion, sentiment and depression with a multi-task model have been already proved to produce better results as compared to single task models [14, 15]. A multi-task framework achives better generalization, improves the performance of each task and requires only one unified model for all the tasks. We have tried to implemented multi-task learning for classification of motor imagery signal where each subject’s motor imagery has been considered a separate task.

In this study, BCI IV-2b dataset (described in detail in Sect. 4.1) consisting of motor imagery data for 9 different subjects is used. The main issue with motor imagery signal classification is that different individuals have different EEG signatures even for the same motor imagination. That is, two different persons performing same motor imagery tasks will have slightly different EEG recordings, which makes its classification difficult. And to address this problem, in most of the studies, separate models are trained for different subjects. Although EEG signatures are different for different subjects but they are highly correlated tasks as they are for same motor imagination. Therefore, we have proposed a single multi-task learnt model where we do not need to train 9 different models for 9 subjects but only one unified multi-task model is capable of handling multiple subjects. In our multi-task setup, motor imagery classification of different subjects is considered as different tasks. We have experimented with three different kinds of multi-task architectures, namely, fully-shared multi-task model, private-shared multi-task model and adversarial multi-task model architectures. Subject specific models, i.e., single task models were also trained and compared with results of multi-task models. The results we obtained for single multi-task model were even better than 9 subject specific trained single task models.

The remainder of the paper is organised as follows: Sect. 2 discusses related works and studies motor imagery classification. The proposed multi-task learning approach is described in Sect. 3. Section 4 describes experimental results and details of the datasets used. Section 5 summarizes the results of this work and draws conclusions.

2 Related Works

Various approaches for motor imagery signal classification have been described in this section. Before advancements in deep learning algorithms, mostly conventional machine learning algorithms like support vector machine, Bayesian classifier, nearest neighbour classifiers etc. were used for motor imagery signal classification. In 2008, BCI-IV competition was held where motor imagery signal classification has to be carried out for 9 subjects. Various conventional and machine learning methods were proposed to solve this problem. Filter Bank Common Spatial Pattern (FBCSP) is a novel machine learning approach proposed by Ang et al. [3] which won that competition. In FBCSP, common spatial filtering was applied on the band pass filtered raw EEG data. And post feature extraction, classification was carried out using machine learning algorithms like Bayesian classifier, K-nearest neighbour, SVM, LDA etc.

Deep learning based algorithms for motor imagery signal classification have also been proposed in studies in recent years. Authors in [13] adopted a deep learning scheme based on Restricted Boltzmann Machines(RBMs) for motor imagery classification. They first converted time domain EEG signal into frequency domain using Fast Fourier Transform (FFT) and then used Wavelet Package Decomposition to extract features. A combined CNN and SAE model for motor imagery classification has been proposed in [9]. In this work they first considered Short Time Fourier Transform (STFT) of the band-pass filtered raw EEG signals to get image representation of the EEG data. Then they used resulting EEG data in image form to train their model. Ping et al. [12] proposed a Long Short Term Memory (LSTM) framework where they have used one dimension-aggregate approximation for EEG feature extraction. And further they employed channel weighing technique to improve their model. Ko et al. [16] introduced a novel Recurrent Spatio-Temporal Neural Network (RSTNN) framework for motor imagery signal classification. With RSTNN, EEG feature extraction is being done in two parts, namely, temporal feature extractor and spatial feature extractor, and three neural networks are used for classification.

As discussed earlier, EEG signature is subject specific even for the same motor imagination among different subjects. To solve this problem, most of the studies had proposed solution where they simply train separate model for each subject. But some hidden features which are common to all subjects are not being learnt by subject specific trained model. Thus, in this study, we have experimented with three different types of multi-task architectures and compared their results with subject specific trained single task models.

3 Proposed Methodology

In order to classify motor imagery signals, we have implemented three different kinds of multi-task architectures. Motor imagery data of 9 different subjects has been used from BCI IV 2008 competition. This dataset is described in detail in Sect. 4.1. In the following subsections, motor imagery signal preprocessing steps and different multi-task architectures used have been explained.

Fig. 1.
figure 1

Preprocessing pipeline

3.1 Preprocessing

We have designed our preprocessing pipeline similar to Tabar et al. [9]. EEG data from BCI competition IV-2b dataset has been used. It consisted of 3 channel EEG recordings (C3, Cz, C4) of 9 different subjects who were performing motor imagination of their left and right hands. At first, EEG epoching was employed, i.e., we extracted all the motor imagery trials from continuous raw EEG time series data. It has been proved in different studies, that frequency domain features provide better results than time domain features in case of motor imagery signal classification [13]. So, we considered STFT such that both time domain and frequency domain features can be leveraged. We considered the STFT of each motor imagery trials with window size of 0.128 s and time lapse of 0.028 s. The resulting spectrogram was band pass filtered for alpha waves (8 Hz–13 Hz) and beta waves(14 Hz–30 Hz). We also normalized the band pass filtered spectrogram which is in image form so that it can be trained properly with various neural network models. Finally, the three STFT outputs corresponding to three electrodes EEG recordings were stacked on the top of each other as it can be seen in Fig. 1. In this way, input image that we received for training is having time, frequency and location information of the EEG signals.

3.2 Learning Architectures

In this section, we’ve described different learning architectures which we used for motor imagery signal classification. As discussed earlier, EEG signature is different for different individuals even for the same motor imagination, so at first we trained 9 subject specific models for each subject. And then different multi-task models were implemented to train motor imagery signals of all the 9 subjects with one model. The multi-task models implemented are fully-shared, private-shared and adversarial-shared multi-task models.

Fig. 2.
figure 2

Conventional multi-task model

Fig. 3.
figure 3

Private-shared multi-task model

Single Task Model. Subject specific models, i.e., separate single task models were trained for each of the 9 subjects. Single task model consisted of 3 convolution-pooling layers, 1 fully connected layer and a softmax output layer. To allow each layer to learn more independently, batch normalization is used. Raw EEG signals were transformed into image forms with the preprocessing pipeline as described in Sect. 3.1 and then these images were given as inputs to CNN. CNNs were used in the model as they are capable of extracting state-of-the-art learning features from the image representation. Outputs from CNN were fed to a fully connected layer and then to a softmax classifier for motor imagery classification.

Conventional-Multitask Models. These can also be called fully-shared multi-task models. Conventional multi-task model consisted of a fully shared three convolution-pooling layers and a fully connected layer as shown in Fig. 2. In place of one softmax output layer as in the case of single task model, it consisted of 9 separate softmax output layers where each output corresponds to a particular subject motor imagery classification label. Fully shared layer comprised of three convolution-pooling layers and one fully connected layer which were common for all the subjects. And the softmax output layers were task specific layers. When several tasks are highly correlated (here motor imagery data for all the subjects), then it is better to train a single multi-task model rather than several single task models. However, the caveat with this conventional multi-task model is that it is not able to learn subject specific features properly as the task specific layers are not private for each subject. These task specific layers are directly connected to shared layers and hence affect in learning subject specific traits. So, to address this issue, private-shared multi-task model was trained.

Private-Shared Multitask Model. It consists of dedicated separate task specific network for each subject and a shared network common for all subjects as shown in Fig. 3. Unlike fully shared multi-task network, where task specific layers were originating from shared layers, it has completely independent task specific private network for each subject. Input EEG images are fed to private and shared layers separately and their outputs are concatenated and then fed to a fully connected and softmax layers. The caveat with this model is that it became a relatively complex model with 9 different private layers. It is almost equivalent to training nine subject specific models with a shared layer. And the sole purpose of these 9 private layers is to learn subject specific features. A simple adversarial network can also be employed to learn subject specific features. So, instead of training nine separate private layers, subject specific traits can also be learned with the generator part of a simple adversarial network. Therefore, we also experimented with an adversarial-shared multitask model.

Adversarial Multitask Model. This architecture consists of a shared network and an adversarial network as shown in Fig. 4. It is very simple architecture as compared to the private-shared multi-task architecture described above. In place of nine different private layers, it has one adversarial network whose task is to learn task specific features. The generator in adversarial network tries to learn subject specific features and the discriminator does subject classification. So, while training, adversarial network tries to learn subject specific features and shared network tries to learn some hidden features which are common to all the subjects. And finally outputs of the generator network (subject specific features) and shared network are fused together to make the final prediction. This type of network does ensure that task specific layers and shared layers learn different sets of parameters.

Fig. 4.
figure 4

Adversarial-shared multi-task model

4 Dataset, Results and Discussion

4.1 Dataset

We have conducted our experiments with BCI IV-2b dataset [17]. This dataset consisted of 3 channel EEG recordings (C3, Cz, C4) with a sampling rate 250 Hz of 9 subjects while they were performing motor imagination. All the 9 subjects were performing two different motor imagery tasks, viz. right hand and left hand motor imaginations. Motor imagery data was recorded in 5 sessions. Each session comprised of 6 runs separated by short breaks. And each run comprised of 20 trials (10 trial for left hand motor imagination and 10 for right hand motor imagination). Out of five sessions, first two sessions were conducted without feedback while the last three with feedback. And the first three sessions were provided with labels. In our study we have used first three sessions for training and testing. First two sessions and 50% of third session are used for training and rest 50% of third session are used as testing dataset for all the subjects except for subject 1. For subject 1, second session EEG dataFootnote 2 was missing and therefore, only first session and 50% of third session are used for training and rest 50% of third session is used for testing.

4.2 Comparison of Results of Single Task and Multi-task Models

Results for single task models, i.e., subject specific trained models and multi-task model are presented in Table 1. Results presented illustrate that motor imagery signals are classified better with a model trained with multiple subject’s EEG data rather than a single subject. A paired t-test revealed that there is significant difference between the results (p = 0.016). At first it might seem suspicions that if EEG data is different for different subjects, then in that case subject specific trained model should perform better. But since all the subjects are performing same motor imagination, i.e., they are thinking of moving their either left or right hand, hence, there must be some common features associated with similar motor imagination which single subject models are not able to decode. So, when a model is trained with multiple subjects’ motor imagery EEG data, some hidden features which are common to a particular motor imagination for all the subjects are being captured which in turn gives better classification accuracy.

Table 1. Results for subject specific trained model and multi-task learnt model

4.3 Comparison of Different Multi-task Models

As discussed in Sect. 3.2, we have implemented three different multi-task model architectures. Results presented in Table 2 illustrate that conventional multi-task model, i.e., fully shared multi-task model is not able to perform as good as the other multi-task architectures. The reason is that these fully-shared multi-task models are not able to learn task specific features for highly correlated tasks. Although these multi-task models are having task specific layers but they do not have dedicated private layers. Weights in the task specific layers are influenced by other subjects’ incoming weights from the fully shared layers which are common for all the subjects. With fully shared layers, conventional multi-task model is able to learn features which are common to all the subjects but task specific layers are not designed well enough that can distinguish between correlated tasks.

To overcome this issue, dedicated private layers were designed in the multi-task architecture. So, in case of private shared multi-task architecture, it was able to learn subject specific features which were not influenced by the shared features. It can be seen in the Table 2, that private shared multi-task model performed better than fully-shared multi-task model. However, caveat with this model is that it became a comparatively complex model. As it can be seen from Fig. 3, private-shared multi-task architecture consists of 10 different sub-networks (9 are subject specific and 1 shared network). This is almost same as training 10 different models and out of these, 9 sub-networks are purposed to capture subject specific features.

In adversarial multi-task model, instead of those 9 sub-networks, we train only one adversarial network whose generator’s task is to generate subject specific features. With adversarial network, we are leveraging all the 9 sub-networks with just one network. Although paired t-test revealed that there is no significant difference between private shared and adversarial multi-task architectures (p = 0.183), but it is evident from results presented in Table 2, that adversarial multi-task network outperformed private-shared multi-task model architecture and attained better results.

Table 2. Comparison of different multi-task models

4.4 Comparison of Multi-task Model with State-of-the-Art Methods

We have also compared the results of our multi-task model with the state-of-the-art model, i.e., CNN-SAE model [9] and BCI IV 2008 competition winner’s algorithm, i.e., FBCSP [4]. While training CNN-SAE model, authors have used first 3 sessions for training and testing. Authors in [4] have used different sessions for different subjects for training their FBCSP model based on some exhaustive search. Most of researchers have used kappa as evaluation metric for motor imagery signal classification. Kappa is used as evaluation metric in classification problems as it removes the effect of random classification performed by the model. Kappa value is defined as follows:

$$\begin{aligned} \kappa = \frac{A_{o} - A_{e}}{1 - A_{e}} \end{aligned}$$
(1)

Here, \(A_{o}\) is actual accuracy and \(A_{e}\) is expected accuracy by chance.

Table 3 presents kappa values attained by CNN-SAE, FBCSP and proposed multi-task learnt models. It is clear from the results that our proposed multi-task learnt model outperforms winner’s algorithm and state-of-the-art model as well. Although our multi-task learnt model surpasses the CNN-SAE model by a small margin, but our proposed model is a very simple model as compared to the CNN-SAE model. Moreover, CNN-SAE model used in [9] is trained explicitly for all the subjects while ours is a single unified model trained for all the subjects (and the main takeaway is that some common hidden features are learnt by a multi-task model which are not learnt with subject specific trained model).

Table 3. Comparisons of FBCSP, CNN-SAE and Multi-task learnt models

4.5 Effect of Different Hyperparameters

We tried different numbers of hidden layers, i.e., 2 layers (1 shared and 1 task specific), 3 layers (2 shared and 1 task-specific), 4 layers (2 shared and 2 task-specific), 5 layers (3 shared and 2 task-specific) and 6 layers (3 shared and 2 task-specific). Conventional multi-task model performed better with 5 layers. Private-shared and adversarial multi-task model performed better with 4 layers in their task-specific and shared network. We also experimented with different learning rates and found that variable learning rate attained better results for motor imagery signal classification. Learning rates of both motor imagery classification model and subject classification (discriminator) model, were set to vary with classification accuracy.

5 Conclusion

In this paper, we have proposed a multi-task learning approach for motor imagery signal classification. Although EEG signature for same motor imagination is different for different persons but they are highly correlated and there are some hidden features which can not be learnt with subject specific trained model. We have shown that motor imagery classification of a particular subject can leverage from concurrent learning of motor imagery of other subjects. Raw EEG time series data was first converted into image form with the help of STFT and then fed into different neural networks. We have experimented with three multi-task architectures, viz. fully shared multi-task model, private-shared multi-task model and adversarial-shared multi-task model. Our experiments showed that multi-task learnt model performs better as compared to subject specific trained model. Among various multi-task model architectures, adversarial-shared multi-task model attains the best classification accuracy. Motor imagery signal classification model can be improved further as we only experimented with vanilla CNN and performed simple concatenation of shared and private layers before prediction. Future studies could fruitfully explore this area further by implementing different model architectures and improve motor imagery classification accuracy.