Multi-kernel extreme learning machine for EEG classification in brain-computer interfaces
Introduction
A brain-computer interface (BCI) is an advanced technique to establish a direct communication between a human brain and a computer (Jin, Sellers, Zhou, Zhang, Wang, Cichocki, 2015, Li, Pan, Long, Yu, Wang, Yu, Wu, 2016, Wolpaw, Birbaumer, McFarland, Pfutscheller, Vaughan, 2002). By recognizing a task-related electroencephalogram (EEG) pattern, BCI translates the mental state of human into computer command and provides a promising approach to recover environmental control capabilities of disabled people (Wang et al., 2016). Currently, the mostly adopted EEG patterns for BCI development include sensorimotor rhythms (SMRs), event-related potentials and steady-state visual evoked potentials (Chen, Wang, Nakanishi, Gao, Jung, Gao, 2015, Jiao, Zhang, Wang, Wang, Jin, Wang, 2017, Jin, Daly, Zhang, Wang, Cichocki, 2014, Ma, Zhang, Cichocki, Mastuno, 2015, Pfurtscheller, Brunner, Schlögl, Lopes, 2006, Shi, Wang, Zhang, 2015, Yu, Li, Long, Gu, 2012, Zhang, Zhou, Jin, Wang, Cichocki, 2014). Event-related desynchronization (ERD) is a significant power decrease of SMRs occurring at the contralateral sensorimotor area during the imagination of unilateral hand movements. Accordingly, motor-imagery (MI) based BCI is designed to detect the desired commands by classifying MI tasks according to ERD features (Yu et al., 2015).
A direct approach for recognizing ERD features is to measure the variance difference between the left and right hemispheres (i.e., electrodes C3 and C4). However, the simple method is likely to give a poor classification accuracy if signals of the two electrodes are contaminated by noises. Until now, extensive research efforts have been dedicated to improving EEG feature extraction and classification for BCI applications (Cong, Lin, Kuang, Gong, Astikainen, Ristaniemi, 2015, Das, Suresh, Sundararajan, 2016, García-Laencina, Rodríguez-Bermudez, Roca-Dorda, 2014, Liu, Yu, Wu, Gu, Li, 2015, Nguyen, Khosravi, Creighton, Nahavandi, 2015, da Silveira, Kozakevicius, Rodrigues, 2016, Wu, Chen, Gao, Brown, 2011, Zhang, Zhou, Jin, Zhao, Wang, Cichocki, 2016, Zhou, Zhao, Zhang, Adalı, Xie, Cichocki, 2016). Common spatial pattern (CSP) seeks spatial filters for multichannel optimization of EEG to maximize the variance of projected signal from one class while to minimize it from another class. In recent years, CSP and its variants have been most popularly applied to robust feature extraction for improving MI-related EEG classification (Arvaneh, Guan, Ang, Quek, 2013, Blankertz, Tomioka, Lemm, Kawanabe, Müller, 2008, Lotte, Guan, 2011, Wu, Chen, Gao, Li, Brown, Gao, 2015, Wu, Wu, Gao, Liu, Li, Gao, 2014).
Another important issue for EEG classification of MI tasks is how to design an powerful classifier with a strong generalization capability (Lotte, Congedo, Lecuyer, Lamarche, Arnaldi, 2007, Zhang et al., 2014, Zhang, Wang, Jin, Wang, 2017). Linear discriminant analysis (LDA) is a relatively simple algorithm and generally works well for pattern classification if the sample covariance matrices are similar among different classes (Krusienski et al., 2006). However, this assumption might not be met for ERD features so that a good classification accuracy could hardly be achieved by LDA due to the possible overfitting problem. Some regularization-based classification algorithms have been recently developed and increasingly applied to EEG analysis. For instance, shrinkage LDA remedies the ill-conditioned covariance matrices with shrinkage covariance estimator and significantly enhances classification accuracy, especially in small sample size scenarios (Blankertz, Lemm, Treder, Haufe, & Müller, 2011). The well-known support vector machine (SVM) adopts a soft margin regularization to achieve good generalization capability (Lotte et al., 2007). The kernel extension of SVM usually provides good effects on the classification of nonlinear features in EEG signals. Currently, combination of CSP and SVM has become one of the most popular methods for MI classification (Qiu, Allison, Jin, Zhang, Wang, Li, Cichocki, 2017, Zhang, Zhou, Jin, Wang, Cichocki, 2015). On the other hand, a sparse representation-based scheme was also proposed for MI classification by l1-norm regularization and showed better performance than LDA (Li, Yu, Bi, Xu, Gu, Amari, 2014, Shin, Lee, Lee, Lee, 2012).
In recent years, extreme learning machine (ELM) has attracted increasing attention from researchers in the pattern recognition field (Avci, 2013, Avci, Coteli, 2012, Huang, Huang, Song, You, 2015). ELM was originally proposed by Huang, Zhu, and Siew (2006) for training single hidden layer feedforward neural networks, such as multilayer perceptron (MLP). The hidden layer weights in ELM are randomly initialized and fixed without iteratively tuning. The output weights are optimized by solving the Moore–Penrose generalized inverse of hidden matrix so that ELM achieves not only the smallest training error but also the smallest norm of output weights. ELM is basically formed by two processing steps: (1) random mapping of input space to ELM feature space and (2) learning of an appropriate linear projection for classification. Some empirical studies have suggested that ELM provides comparable or even better generalization capability than that of SVM and its variants (Chen, Ou, 2011, Huang, 2014). Furthermore, a probabilistic version of ELM has been developed to estimate the probability distribution of output values instead of fitting data, thereby alleviating the data overfitting problem. Our previous study (Zhang, Jin, Wang, & Wang, 2016) validated the effectiveness of Bayesian ELM for EEG classification. However, the randomly assigned node parameters generally result in a large variation in classification accuracy for different runnings with the same number of hidden nodes (Pal, Maxwell, & Warner, 2013). Also, the optimal dimensionality of ELM feature space varies for different applications and is usually determined by experience or a time-consuming procedure (Iosifidis, Tefas, & Pitas, 2015). To overcome the problems, kernel extension of ELM has been increasingly studied to circumvent calculation of the hidden layer outputs and inherently encode it in a kernel matrix (Huang, Zhou, Ding, & Zhang, 2012). Instead of using a single kernel, ELM with multi-kernel learning has recently arisen and is able to achieve improved classification performance by combining different kernels (Liu, Wang, Huang, Zhang, & Yin, 2015).
Inspired by these studies, we propose a multi-kernel ELM (MKELM)-based method for accurate classification of EEG associated to MI tasks in BCI applications. Two different types of kernels, i.e., Gaussian kernel and polynomial kernel are exploited to map the original CSP features to different nonlinear feature spaces. The two nonlinear feature spaces provide richer discriminant information that may be supplementary to each other. Accordingly, we integrated them using a multi-kernel learning strategy to achieve more robust classification of EEG in MI tasks. With two public EEG datasets, an extensive experimental comparison is carried out among the proposed MKELM-based method and several other competing approaches. The experimental results demonstrate that the MKELM method is a promising candidate for the development of an improved MI-based BCI.
The rest of the paper is structured as follows. In Section 2, feature extraction and classification procedures are described. Some basic concepts of ELM are briefly reviewed. Multi-kernel ELM is introduced. Extensive experimental results are given in Section 3. A discussion is given in Section 4. Finally, in Section 5, some conclusions are given.
Section snippets
Feature extraction
Common spatial pattern (CSP) has proven to be an effective method for feature extraction in classifying two classes of motor imagery EEG data. Let Xi, 1 and denote EEG samples of two classes recorded from the i-th trial with C and P being the number of channels and samples, respectively. Assume both the EEG samples have been bandpass filtered at a specified frequency band and mean-removed. CSP aims at finding spatial filter to transform the EEG data so that the ratio of data
Data description
Two public EEG datasets were used for the experimental study. The first dataset was available from BCI Competition III dataset IVa. The EEG data were collected at the sampling rate of 100 Hz from 118 electrodes for five subjects (named “aa”, “al”, “av”, “aw”, and “ay”), during the imaginations of right hand or foot movements. Each subject completed 280 trials (half for each class of MI) and each trial lasted for 3.5 s. See http://www.bbci.de/competition/iii/ for more details about the dataset.
Discussion
The pros and cons of different classification algorithms are summarized in Table 3. Similar to ELM, the well-known MLP presents universal approximation capability for continuous functions (Tang, Deng, & Huang, 2016). However, the hidden layer weights of MLP need to be tuned typically by the time-consuming back propagation (a gradient descent-based algorithm). SVM can deal with high dimensional data and generally provide good generalization performance, but its parameter selection is data
Conclusions
In this study, we proposed a multi-kernel ELM-based method for EEG classification in motor-imagery BCIs. Two different types of kernels, i.e., Gaussian kernel and polynomial kernel were exploited to map the original CSP features to different nonlinear feature spaces that provide richer discriminant information. With multi-kernel learning, the two nonlinear feature spaces were integrated to achieve more robust classification of EEG. An extensive experimental comparison was carried out on two
Acknowledgments
This work was supported in part by the grant National Natural Science Foundation of China, under Grant nos. 91420302, 61573142, 61673124. This work was also supported by the Fundamental Research Funds for the Central Universities WH1516018, Shanghai Chenguang Program under Grant 14CG3 and Shanghai Natural Science Foundation under Grant 16ZR1407500, the Programme of Introducing Talents of Discipline to Universities (the 111 Project) under Grant B17017, the MES RF grant 14.756.31.0001, and the
References (68)
A new method for expert target recognition system: genetic wavelet extreme learning machine (GAWELM)
Expert Systems with Applications
(2013)- et al.
A new automatic target recognition system based on wavelet extreme learning machine
Expert Systems with Applications
(2012) - et al.
Single-trial analysis and classification of ERP components – A tutorial
NeuroImage
(2011) - et al.
Sales forecasting system based on gray extreme learning machine with Taguchi method in retail industry
Expert Systems with Applications
(2011) - et al.
Recursive projection twin support vector machine via within-class variance minimization
Pattern Recognition
(2011) - et al.
Tensor decomposition of EEG signals: A brief review
Journal of Neuroscience Methods
(2015) - et al.
A discriminative subject-specific spatio-spectral filter selection approach for EEG based motor-imagery task classification
Expert Systems with Applications
(2016) - et al.
Exploring dimensionality reduction of EEG features in motor imagery task classification
Expert Systems with Applications
(2014) - et al.
Trends in extreme learning machines: A review
Neural Networks
(2015) - et al.
Extreme learning machine: Theory and applications
Neurocomputing
(2006)
On the kernel extreme learning machine classifier
Pattern Recognition Letters
A review of classification algorithms for eeg-based brain-computer interfaces
Journal of Neural Engineering
Kernel-based extreme learning machine for remote-sensing image classification
Remote Sensing Letters
Optimized motor imagery paradigm based on imagining chinese characters writing movement
IEEE Transactions on Neural Systems and Rehabilitaion Engineering
Sparse representation-based classification scheme for motor imagery-based brain-computer interface systems
Journal of Neural Engineering
Extreme learning machine for multilayer perceptron systems
IEEE Transactions on Neural Networks and Learning Systems
Multitask diagnosis for autism spectrum disorders using multimodality features: A multicenter study
Human Brain Mapping
Sparse Bayesian extreme learning machine and its application to biofuel engine performance prediction
Neurocomputing
A hierarchical Bayesian approach for learning sparse spatio-temporal decompositions of multichannel EEG
NeuroImage
Probabilistic common spatial patterns for multichannel EEG analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
An efficient multiple-kernel learning for pattern classification
Expert Systems with Applications
Surfing the internet with a BCI mouse
Journal of Neural Engineering
Sparse bayesian multiway canonical correlation analysis for EEG pattern recognition analysis
Neurocomputing
Sparse Bayesian classification of EEG for brain-computer interface
IEEE Transactions on Neural Networks and Learning Systems
Kernelization of tensor-based models for multiway data analysis: Processing of multidimensional structured data
IEEE Signal Processing Magazine
Optimizing the channel selection and classification accuracy in EEG-based BCI
IEEE Transactions on Biomedical Engineering
Optimizing spatial filters by minimizing within-class dissimilarities in electroencephalogram-based brain-computer interface
IEEE Transactions on Neural Networks and Learning Systems
Optimizing spatial filters for robust EEG single-trial analysis
IEEE Signal Processing Magazine
High-speed spelling with a noninvasive brain-computer interface
Proceedings of the National Academy of Sciences of the United States of America
Joint blind source separation for neurophysiological data analysis: Multiset and multimodal methods
IEEE Signal Processing Magazine
An improved robust and sparse twin support vector regression via linear programming
Soft Computing
Learning capability and storage capacity of two-hidden-layer feedforward networks
IEEE Transations on Neural Networks
An insight into extreme learning machines: Random neurons, random features and kernels
Cognitive Computation
Universal approximation using incremental constructive feedforward networks with random hidden nodes
IEEE Transactions on Neural Networks
Cited by (225)
A construction strategy of Kriging surrogate model based on Rosenblatt transformation of associated random variables and its application in groundwater remediation
2024, Journal of Environmental ManagementOnline semi-supervised learning for motor imagery EEG classification
2023, Computers in Biology and MedicineImplementation of artificial intelligence and machine learning-based methods in brain–computer interaction
2023, Computers in Biology and MedicineDeep stacked least square support matrix machine with adaptive multi-layer transfer for EEG classification
2023, Biomedical Signal Processing and ControlWavelet transform based deep residual neural network and ReLU based Extreme Learning Machine for skin lesion classification
2023, Expert Systems with ApplicationsIdentifying epileptic EEGs and congestive heart failure ECGs under unified framework of wavelet scattering transform, bidirectional weighted (2D)<sup>2</sup>PCA and KELM
2023, Biocybernetics and Biomedical Engineering