Elsevier

Expert Systems with Applications

Volume 96, 15 April 2018, Pages 302-310
Expert Systems with Applications

Multi-kernel extreme learning machine for EEG classification in brain-computer interfaces

https://doi.org/10.1016/j.eswa.2017.12.015Get rights and content

Highlights

  • Multi-kernel extreme learning machine based method is proposed for EEG classification.

  • Supplementary information from different kernels are integrated for better accuracy.

  • Extensive experimental comparison confirms superiority of the proposed method.

Abstract

One of the most important issues for the development of a motor-imagery based brain-computer interface (BCI) is how to design a powerful classifier with strong generalization capability. Extreme learning machine (ELM) has recently proven to be comparable or more efficient than support vector machine for many pattern recognition problems. In this study, we propose a multi-kernel ELM (MKELM)-based method for motor imagery electroencephalogram (EEG) classification. The kernel extension of ELM provides an elegant way to circumvent calculation of the hidden layer outputs and inherently encode it in a kernel matrix. We investigate effects of two different kernel functions (i.e., Gaussian kernel and polynomial kernel) on the performance of kernel ELM. The MKELM method is subsequently developed by integrating these two types of kernels with a multi-kernel learning strategy, which can effectively explore the supplementary information from multiple nonlinear feature spaces for more robust classification of EEG. An extensive experimental comparison with two public EEG datasets indicates that the MKELM method gives higher classification accuracy than those of the other competing algorithms. The experimental results confirm that superiority of the proposed MKELM-based method for accurate classification of EEG associated with motor imagery in BCI applications. Our method also provides a promising and generalized solution to investigate the complex and nonlinear information for various applications in the fields of expert and intelligent systems.

Introduction

A brain-computer interface (BCI) is an advanced technique to establish a direct communication between a human brain and a computer (Jin, Sellers, Zhou, Zhang, Wang, Cichocki, 2015, Li, Pan, Long, Yu, Wang, Yu, Wu, 2016, Wolpaw, Birbaumer, McFarland, Pfutscheller, Vaughan, 2002). By recognizing a task-related electroencephalogram (EEG) pattern, BCI translates the mental state of human into computer command and provides a promising approach to recover environmental control capabilities of disabled people (Wang et al., 2016). Currently, the mostly adopted EEG patterns for BCI development include sensorimotor rhythms (SMRs), event-related potentials and steady-state visual evoked potentials (Chen, Wang, Nakanishi, Gao, Jung, Gao, 2015, Jiao, Zhang, Wang, Wang, Jin, Wang, 2017, Jin, Daly, Zhang, Wang, Cichocki, 2014, Ma, Zhang, Cichocki, Mastuno, 2015, Pfurtscheller, Brunner, Schlögl, Lopes, 2006, Shi, Wang, Zhang, 2015, Yu, Li, Long, Gu, 2012, Zhang, Zhou, Jin, Wang, Cichocki, 2014). Event-related desynchronization (ERD) is a significant power decrease of SMRs occurring at the contralateral sensorimotor area during the imagination of unilateral hand movements. Accordingly, motor-imagery (MI) based BCI is designed to detect the desired commands by classifying MI tasks according to ERD features (Yu et al., 2015).

A direct approach for recognizing ERD features is to measure the variance difference between the left and right hemispheres (i.e., electrodes C3 and C4). However, the simple method is likely to give a poor classification accuracy if signals of the two electrodes are contaminated by noises. Until now, extensive research efforts have been dedicated to improving EEG feature extraction and classification for BCI applications (Cong, Lin, Kuang, Gong, Astikainen, Ristaniemi, 2015, Das, Suresh, Sundararajan, 2016, García-Laencina, Rodríguez-Bermudez, Roca-Dorda, 2014, Liu, Yu, Wu, Gu, Li, 2015, Nguyen, Khosravi, Creighton, Nahavandi, 2015, da Silveira, Kozakevicius, Rodrigues, 2016, Wu, Chen, Gao, Brown, 2011, Zhang, Zhou, Jin, Zhao, Wang, Cichocki, 2016, Zhou, Zhao, Zhang, Adalı, Xie, Cichocki, 2016). Common spatial pattern (CSP) seeks spatial filters for multichannel optimization of EEG to maximize the variance of projected signal from one class while to minimize it from another class. In recent years, CSP and its variants have been most popularly applied to robust feature extraction for improving MI-related EEG classification (Arvaneh, Guan, Ang, Quek, 2013, Blankertz, Tomioka, Lemm, Kawanabe, Müller, 2008, Lotte, Guan, 2011, Wu, Chen, Gao, Li, Brown, Gao, 2015, Wu, Wu, Gao, Liu, Li, Gao, 2014).

Another important issue for EEG classification of MI tasks is how to design an powerful classifier with a strong generalization capability (Lotte, Congedo, Lecuyer, Lamarche, Arnaldi, 2007, Zhang et al., 2014, Zhang, Wang, Jin, Wang, 2017). Linear discriminant analysis (LDA) is a relatively simple algorithm and generally works well for pattern classification if the sample covariance matrices are similar among different classes (Krusienski et al., 2006). However, this assumption might not be met for ERD features so that a good classification accuracy could hardly be achieved by LDA due to the possible overfitting problem. Some regularization-based classification algorithms have been recently developed and increasingly applied to EEG analysis. For instance, shrinkage LDA remedies the ill-conditioned covariance matrices with shrinkage covariance estimator and significantly enhances classification accuracy, especially in small sample size scenarios (Blankertz, Lemm, Treder, Haufe, & Müller, 2011). The well-known support vector machine (SVM) adopts a soft margin regularization to achieve good generalization capability (Lotte et al., 2007). The kernel extension of SVM usually provides good effects on the classification of nonlinear features in EEG signals. Currently, combination of CSP and SVM has become one of the most popular methods for MI classification (Qiu, Allison, Jin, Zhang, Wang, Li, Cichocki, 2017, Zhang, Zhou, Jin, Wang, Cichocki, 2015). On the other hand, a sparse representation-based scheme was also proposed for MI classification by l1-norm regularization and showed better performance than LDA (Li, Yu, Bi, Xu, Gu, Amari, 2014, Shin, Lee, Lee, Lee, 2012).

In recent years, extreme learning machine (ELM) has attracted increasing attention from researchers in the pattern recognition field (Avci, 2013, Avci, Coteli, 2012, Huang, Huang, Song, You, 2015). ELM was originally proposed by Huang, Zhu, and Siew (2006) for training single hidden layer feedforward neural networks, such as multilayer perceptron (MLP). The hidden layer weights in ELM are randomly initialized and fixed without iteratively tuning. The output weights are optimized by solving the Moore–Penrose generalized inverse of hidden matrix so that ELM achieves not only the smallest training error but also the smallest norm of output weights. ELM is basically formed by two processing steps: (1) random mapping of input space to ELM feature space and (2) learning of an appropriate linear projection for classification. Some empirical studies have suggested that ELM provides comparable or even better generalization capability than that of SVM and its variants (Chen, Ou, 2011, Huang, 2014). Furthermore, a probabilistic version of ELM has been developed to estimate the probability distribution of output values instead of fitting data, thereby alleviating the data overfitting problem. Our previous study (Zhang, Jin, Wang, & Wang, 2016) validated the effectiveness of Bayesian ELM for EEG classification. However, the randomly assigned node parameters generally result in a large variation in classification accuracy for different runnings with the same number of hidden nodes (Pal, Maxwell, & Warner, 2013). Also, the optimal dimensionality of ELM feature space varies for different applications and is usually determined by experience or a time-consuming procedure (Iosifidis, Tefas, & Pitas, 2015). To overcome the problems, kernel extension of ELM has been increasingly studied to circumvent calculation of the hidden layer outputs and inherently encode it in a kernel matrix (Huang, Zhou, Ding, & Zhang, 2012). Instead of using a single kernel, ELM with multi-kernel learning has recently arisen and is able to achieve improved classification performance by combining different kernels (Liu, Wang, Huang, Zhang, & Yin, 2015).

Inspired by these studies, we propose a multi-kernel ELM (MKELM)-based method for accurate classification of EEG associated to MI tasks in BCI applications. Two different types of kernels, i.e., Gaussian kernel and polynomial kernel are exploited to map the original CSP features to different nonlinear feature spaces. The two nonlinear feature spaces provide richer discriminant information that may be supplementary to each other. Accordingly, we integrated them using a multi-kernel learning strategy to achieve more robust classification of EEG in MI tasks. With two public EEG datasets, an extensive experimental comparison is carried out among the proposed MKELM-based method and several other competing approaches. The experimental results demonstrate that the MKELM method is a promising candidate for the development of an improved MI-based BCI.

The rest of the paper is structured as follows. In Section 2, feature extraction and classification procedures are described. Some basic concepts of ELM are briefly reviewed. Multi-kernel ELM is introduced. Extensive experimental results are given in Section 3. A discussion is given in Section 4. Finally, in Section 5, some conclusions are given.

Section snippets

Feature extraction

Common spatial pattern (CSP) has proven to be an effective method for feature extraction in classifying two classes of motor imagery EEG data. Let Xi, 1 and Xi,2RC×P denote EEG samples of two classes recorded from the i-th trial with C and P being the number of channels and samples, respectively. Assume both the EEG samples have been bandpass filtered at a specified frequency band and mean-removed. CSP aims at finding spatial filter wRC to transform the EEG data so that the ratio of data

Data description

Two public EEG datasets were used for the experimental study. The first dataset was available from BCI Competition III dataset IVa. The EEG data were collected at the sampling rate of 100 Hz from 118 electrodes for five subjects (named “aa”, “al”, “av”, “aw”, and “ay”), during the imaginations of right hand or foot movements. Each subject completed 280 trials (half for each class of MI) and each trial lasted for 3.5 s. See http://www.bbci.de/competition/iii/ for more details about the dataset.

Discussion

The pros and cons of different classification algorithms are summarized in Table 3. Similar to ELM, the well-known MLP presents universal approximation capability for continuous functions (Tang, Deng, & Huang, 2016). However, the hidden layer weights of MLP need to be tuned typically by the time-consuming back propagation (a gradient descent-based algorithm). SVM can deal with high dimensional data and generally provide good generalization performance, but its parameter selection is data

Conclusions

In this study, we proposed a multi-kernel ELM-based method for EEG classification in motor-imagery BCIs. Two different types of kernels, i.e., Gaussian kernel and polynomial kernel were exploited to map the original CSP features to different nonlinear feature spaces that provide richer discriminant information. With multi-kernel learning, the two nonlinear feature spaces were integrated to achieve more robust classification of EEG. An extensive experimental comparison was carried out on two

Acknowledgments

This work was supported in part by the grant National Natural Science Foundation of China, under Grant nos. 91420302, 61573142, 61673124. This work was also supported by the Fundamental Research Funds for the Central Universities WH1516018, Shanghai Chenguang Program under Grant 14CG3 and Shanghai Natural Science Foundation under Grant 16ZR1407500, the Programme of Introducing Talents of Discipline to Universities (the 111 Project) under Grant B17017, the MES RF grant 14.756.31.0001, and the

References (68)

  • A. Iosifidis et al.

    On the kernel extreme learning machine classifier

    Pattern Recognition Letters

    (2015)
  • F. Lotte et al.

    A review of classification algorithms for eeg-based brain-computer interfaces

    Journal of Neural Engineering

    (2007)
  • M. Pal et al.

    Kernel-based extreme learning machine for remote-sensing image classification

    Remote Sensing Letters

    (2013)
  • Z. Qiu et al.

    Optimized motor imagery paradigm based on imagining chinese characters writing movement

    IEEE Transactions on Neural Systems and Rehabilitaion Engineering

    (2017)
  • Y. Shin et al.

    Sparse representation-based classification scheme for motor imagery-based brain-computer interface systems

    Journal of Neural Engineering

    (2012)
  • J. Tang et al.

    Extreme learning machine for multilayer perceptron systems

    IEEE Transactions on Neural Networks and Learning Systems

    (2016)
  • J. Wang et al.

    Multitask diagnosis for autism spectrum disorders using multimodality features: A multicenter study

    Human Brain Mapping

    (2017)
  • K. Wong et al.

    Sparse Bayesian extreme learning machine and its application to biofuel engine performance prediction

    Neurocomputing

    (2015)
  • W. Wu et al.

    A hierarchical Bayesian approach for learning sparse spatio-temporal decompositions of multichannel EEG

    NeuroImage

    (2011)
  • W. Wu et al.

    Probabilistic common spatial patterns for multichannel EEG analysis

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2015)
  • C.Y. Yeh et al.

    An efficient multiple-kernel learning for pattern classification

    Expert Systems with Applications

    (2013)
  • T. Yu et al.

    Surfing the internet with a BCI mouse

    Journal of Neural Engineering

    (2012)
  • Y. Zhang et al.

    Sparse bayesian multiway canonical correlation analysis for EEG pattern recognition analysis

    Neurocomputing

    (2017)
  • Y. Zhang et al.

    Sparse Bayesian classification of EEG for brain-computer interface

    IEEE Transactions on Neural Networks and Learning Systems

    (2016)
  • Q. Zhao et al.

    Kernelization of tensor-based models for multiway data analysis: Processing of multidimensional structured data

    IEEE Signal Processing Magazine

    (2013)
  • M. Arvaneh et al.

    Optimizing the channel selection and classification accuracy in EEG-based BCI

    IEEE Transactions on Biomedical Engineering

    (2011)
  • M. Arvaneh et al.

    Optimizing spatial filters by minimizing within-class dissimilarities in electroencephalogram-based brain-computer interface

    IEEE Transactions on Neural Networks and Learning Systems

    (2013)
  • B. Blankertz et al.

    Optimizing spatial filters for robust EEG single-trial analysis

    IEEE Signal Processing Magazine

    (2008)
  • X. Chen et al.

    High-speed spelling with a noninvasive brain-computer interface

    Proceedings of the National Academy of Sciences of the United States of America

    (2015)
  • X. Chen et al.

    Joint blind source separation for neurophysiological data analysis: Multiset and multimodal methods

    IEEE Signal Processing Magazine

    (2016)
  • X. Chen et al.

    An improved robust and sparse twin support vector regression via linear programming

    Soft Computing

    (2014)
  • G.-B. Huang

    Learning capability and storage capacity of two-hidden-layer feedforward networks

    IEEE Transations on Neural Networks

    (2003)
  • G.-B. Huang

    An insight into extreme learning machines: Random neurons, random features and kernels

    Cognitive Computation

    (2014)
  • G.-B. Huang et al.

    Universal approximation using incremental constructive feedforward networks with random hidden nodes

    IEEE Transactions on Neural Networks

    (2006)
  • Cited by (225)

    View all citing articles on Scopus
    View full text