P300 event-related potential detection using one-dimensional convolutional capsule networks

doi:10.1016/j.eswa.2021.114701

Expert Systems with Applications

Volume 174, 15 July 2021, 114701

https://doi.org/10.1016/j.eswa.2021.114701 Get rights and content

Highlights

•
A new method to P300 ERP detection called 1D-CapsNet.
•
The improvement of the convolution dimension allows better detection of P300 ERP.
•
Two classifiers based on 1D-CapsNet are proposed to better transform BCI.
•
The performance of 1D-CapsNet is better than other state-of-the-art algorithms.

Abstract

The main challenge in creating a brain-computer interface (BCI) is establishing an effective brain signal recognition model suitable for achieving direct communication between humans and computers. Recently, various deep learning-based methods have been proposed to improve the performance of P300 event-related potentials (ERPs) for BCI. However, during the detection of P300 ERP signals, even electroencephalogram (EEG) signals from the same person are inconsistent and may be significantly distorted, resulting in impaired classification accuracy in many deep learning methods. Here, we propose a machine learning model based on a one-dimensional convolutional capsule network (1D-CapsNet). This network topology can effectively detect P300 ERP signals in the time domain, thereby achieving a better detection performance than can the current convolutional neural network (CNN)-based methods. Two classifiers based on the 1D-CapsNet model are proposed, namely, 1D-CapsNet-64 and 1D-CapsNet-8, which are used for classifying EEG data with 64 and 8 electrodes, respectively. These two classifiers are tested and compared on dataset II of the third BCI competition. The results show that the 1D-CapsNet-64 classifier obtains the best character recognition rate result (96%). The proposed method is superior to both state-of-the-art CNN-based methods and various traditional machine learning methods. The experimental results reveal the feasibility of our proposed method for detecting P300 ERP signals. The proposed method is expected to expand the concept of EEG signal recognition pattern and improve BCI design and applications.

Introduction

Recently, with the rapid development of artificial intelligence technology, research on human-computer intelligent interactions has become a focus in the field of ergonomics. Scholars have attempted to use interactive technologies such as speech recognition (Hinton et al., 2012), eye tracking (Black et al., 2018, Kim et al., 2019), gesture control (Morganti et al., 2012), and brain signals (Baloglu and Yildirim, 2019, Yıldırım et al., 2020) to improve and create new intelligent interactive experiences.

BCI is a direct communication pathway between the human brain and a computer that requires no physical interaction. It is also one a cutting-edge and highly challenging type of interactive technology (Allison et al., 2007, Kostov and Polak, 2000). BCI can acquire, decode, and recognize brain signals (Schalk et al., 2004) and use the results to make decisions (Shih et al., 2012), which allows BCI to be used to help people with severe motor disabilities such as spinal cord injuries or amyotrophic lateral sclerosis (ALS) (Chen et al., 2020) interact more effectively with computers and other smart devices (Nicolas-Alonso & Gomez-Gil, 2012). The core aspects of a BCI system are usually divided into four parts: recording signals generated by the user's brain, signal pre-processing, feature extraction, and brain signal classification. Because EEG collection is non-invasive and can be acquired using relatively inexpensive equipment, EEG signals have become the common approach to building BCI systems. The EEG classification strategy depends on the stimulus itself, and mainly includes the following signal types: ERPs (Birbaumer et al., 1999; Jin, Chen, et al., 2020), steady-state evoked potential (SSEP) (Müller-Putz et al., 2006), motor imagery (MI) (Ha and Jeong, 2019, Jin et al., 2020, Jin et al., 2020, Jin et al., 2020) or slow cortical potential (SCP) (Pfurtscheller et al., 1997). The differences among these signal types mean that EEG signals require specific feature extraction methods and classification algorithms to achieve accurate classification.

Although neuroscience has provided knowledge and guidance on EEG detection and signal processing, machine learning algorithms allow feature extraction and modelling the signal variability over time and over subjects (Müller et al., 2008). Therefore, machine learning algorithms are widely used in EEG signal classification. Classical machine learning algorithms such as linear discriminant analysis (LDA) (Jin, Li, et al., 2020), naive Bayes (NB) (Lotte et al., 2018), support vector machine (SVM) (Rakotomamonjy & Guigue, 2008), hidden Markov model (HMM) (Obermaier et al., 2001, Zhong and Ghosh, 2002), and neural networks (NNs) (Cecotti & Gräser, 2008) have achieved various levels of success in EEG classification. Some scholars have applied the ensemble-based modeling order mixture and evolutionary-based order fusion methods in BCI recognition, and the effect is better (Atyabi et al., 2016). However, the accuracy of BCI systems based on P300 ERPs is unsatisfactory, and there is still room for improvement. Scholars first applied backpropagation neural networks to EEG pattern recognition (Hiraiwa et al., 1990), demonstrating that deep learning algorithms can be applied to EEG classification and BCI pattern recognition. Since then, various deep learning methods have been tested for EEG recognition and some have achieved good results (Thomas et al., 2017). In the most classic example, Cecotti H et al., introduced CNNs into BCI to detect P300 ERPs. The author proposed seven CNN-based classifiers and evaluated their performances and network topology; the final classification result achieved excellent results. The character recognition rate reached as high as 95.5%, outperforming the recognition rates of traditional machine learning methods (Cecotti & Gräser, 2011). Because recurrent neural network (RNN) models have achieved good results on sequence information recognition tasks (such as speech recognition) (Lipton et al., 2015, Yao et al., 2020), the long- and short-term memory (LSTM) network and gated recurrent unit (GRU) have also been applied to EEG recognition. However, the classification results seem to be similar to those of a CNN, and the RNN models require longer times for training and testing, and their real-time performance levels are relatively poor (Joshi et al., 2018).

CNNs have achieved great successes in computer vision in recent years, but they also have some limitations (Sabour et al., 2017). First, if the test data are distorted (tilted, rotated, etc.), the CNN classification results will be adversely affected. Second, the purpose of pooling operations in CNNs is to establish position invariance rather than equivalence. Even when this approach works well, it is a disaster for the data itself. On the other hand, CNNs learn limited spatial information by expanding the pooling field of view without considering the core spatial relationships between data objects; thus, they tend to lose the spatial positioning of different components in the data, which also leads to a decline in CNN classification performance. To solve these problems, Hinton et al. proposed a new type of deep neural network architecture in the paper “Dynamic Routing Between Capsules” published at the end of 2017, called Capsule Networks (CapsNet) (Sabour et al., 2017). CapsNet uses the concept of capsules to automatically learn various object features, and it also considers the core spatial relationships between objects to retain the component spaces they occupy. CapsNet achieved 55% and 98.5% accuracy rates when classifying the SVHN and MNIST datasets, respectively, outperforming the previous best CNNs, and making it the best unsupervised classification result to date (Kosiorek et al., 2019). CapsNet's excellent performance has led many scholars to try to apply the CapsNet model to other fields. In the BCI field, EEG signal data typically have a low signal-to-noise ratio (SNR). A large number of inconsistent and unstable interference signals are generated by electro-oculograms (EOGs), electromyograms (EMGs), power frequency, and other types of interference. In addition, The P300 ERP signals are usually submerged in the EEG, making them difficult to distinguish directly from the raw signal manually. Therefore, BCI based on P300 ERP detection has many opportunities to benefit from CapsNet (Ha & Jeong, 2019).

In this paper, P300 event-related potential EEG detection was researched and analyzed as follows. (i) According to the experimental paradigm and requirements, the original EEG data is organized into training sets and tests sets through a preprocessing that mainly includes data cleaning, data subsampling, and data normalization, to provide data support for the proposal and verification of the algorithm. (ii) On this basis, we introduced the dynamic routing between capsules theory. The hyperparameters of the model was analyzed and reconstructed through the improved topology, then two classifiers 1D-CapsNet-64 and 1D-CapsNet-8 based on the 1D-CapsNet model were proposed that classifies the EEG data of 64 and 8 electrodes respectively. (iii) The classification results were applied to character prediction for observation of the accuracy of the classifier, character prediction rate, and information transmission rate, and other indicators. (vi) The method proposed in this article were compared with other advanced machine learning algorithms, the proposed method was proved to be feasible. The specific process of P300 signal detection is shown in Fig. 1.

Our contributions in this article are as follows:

(1)
A P300 ERP detection method based on the CapsNet model is proposed for the first time and used for character recognition in the P300 speller.
(2)
Although based on the existing CapsNet model, the CapsNet model used here is improved by adding a one-dimensional convolution so that it can decode the P300 ERP signal in the EEG. The resulting character recognition rate can reach 98%. The proposed network topology is studied in detail, and the most universal network topology is selected as the proposed model.
(3)
Two classifiers for P300 ERP signal detection based on 1D-CapsNet are proposed, making our proposed model more practical for BCI implementations. These two classifiers are used to classify EEG data with 64 electrodes and 8 electrodes, and their performances are evaluated.

The remainder of this paper is organized as follows: Section 2 introduces the data set used in this article and reviews previous related research. Section 3 describes the proposed method in detail. Section 4 reports the experimental results and provides an analysis of the network topology. Finally, a detailed discussion and conclusions are presented in sections 5 and 6, respectively.

Section snippets

Experiments and dataset

The P300 wave is one of the main components of ERPs, which are mainly obtained from an EEG signal. The P300 wave is a positive deflection of the voltage that occur approximately 300 ms after the brain is stimulated, such as by a flash. In general, the amplitude of the P300 ERP signal is highest near the parietal lobe and occipital lobe (PZ electrode). It is difficult to find P300 ERP signals and extract their features directly from raw EEG signals without data processing. Although the detection

Original CapsNet

The main difference between the CapsNet and a traditional CNN is that a CNN creates a deep network through continuous convolutional layers. In contrast, CapsNet embeds neurons that focus on the same category or attribute into a capsule, which is a group of neurons. The length of the activity vector of the capsule represents the probability of the existence of the entity, and the direction of the activity vector represents the instantiation parameter. This allows the capsule network to perceive

Parameter optimization and classifier selection

For the training and parameter optimization of 1D-CapsNet, we conducted experiments using a PC workstation equipped with an NVIDIA GeForce GTX 1070 GPU, an AMD Ryzen 7 1800X CPU, and 16 GB of RAM. The entire algorithm was implemented using the Python Keras neural network library.

In each 50 epochs of training, the Adam optimization method, which has a fast convergence speed and a good optimization effect is used to update the parameters. By default, the learning rate is 0.001, $β_{1} = 0.9$ , $β_{2} = 0.999$ ,

Discussion

In this paper, the feasibility of the CapsNet model in EEG detection is demonstrated via a large number of experiments. The original CapsNet model is improved by one-dimensional convolution and its network structure is modified to make it more suitable for P300 ERP detection. Compared with other methods based on machine learning and deep learning regarding performance and effectiveness, the experimental results in Section 4 show that the detection methods based on the 1D-CapsNet model

Conclusion

The study proposed a 1D-CapsNet method for P300 ERPs detection based on the “Dynamic Routing Between Capsule” theory. This model combined the idea of one-dimensional convolution with the traditional Caps Net model to make it more suitable for EEG signal detection. Concretely, to make 1D-CapsNet more practical for BCI application, the proposed method by applying 1D-CapsNet-64 and 1D-CapsNet-8 classifiers for the detection of P300 ERPs with different number of electrodes, and compared with other

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Xiang Liu: Conceptualization, Writing - original draft, Methodology, Formal analysis, Software. Qingsheng Xie: Supervision, Conceptualization. Jian Lv: Software, Writing - review & editing. Haisong Huang: Visualization, Methodology. Weixing Wang: Data curation, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China grant 52065010, 51865004. the Science and Technology Top Talent Support Program Project of Guizhou Province grant KY[2018]037, and in part by the Department of Education Project of Guizhou Province under Grant YJSCXJH[2019]108).

References (52)

A. Atyabi et al.
Mixture of autoregressive modeling orders and its implication on single trial EEG classification
Expert Systems with Applications
(2016)
L.A. Farwell et al.
Talking off the top of your head : Toward a mental prosthesis utilizing event-related brain potentials
Electroencephalography and Clinical Neurophysiology
(1988)
M. Liu et al.
Deep learning based on Batch Normalization for P300 signal detection
Neurocomputing
(2018)
E. Morganti et al.
A smart watch with embedded sensors to recognize objects, grasps and forearm gestures
Procedia Engineering
(2012)
K.R. Müller et al.
Machine learning for real-time single-trial EEG-analysis: From brain-computer interfacing to mental state monitoring
Journal of Neuroscience Methods
(2008)
B. Obermaier et al.
Hidden Markov models for online classification of single trial EEG data
Pattern Recognition Letters
(2001)
G. Pfurtscheller et al.
EEG-based discrimination between imagination of right and left hand movement
Electroencephalography and Clinical Neurophysiology
(1997)
J.J. Shih et al.
Brain-computer interfaces in medicine
Mayo Clinic Proceedings
(2012)
B.Z. Allison et al.
Brain-computer interface systems: Progress and prospects
Expert Review of Medical Devices
(2007)
U.B. Baloglu et al.
Convolutional long-short term memory networks model for long duration EEG signal classification
Journal of Mechanics in Medicine and Biology
(2019)

N. Birbaumer et al.

A spelling device for the paralysed

Nature

(1999)

D. Black et al.

Auditory display as feedback for a novel eye-tracking system for sterile operating room interaction

International Journal of Computer Assisted Radiology and Surgery

(2018)

B. Blankertz et al.

The BCI competition III: Validating alternative approaches to actual BCI problems

IEEE Transactions on Neural Systems and Rehabilitation Engineering

(2006)

L. Breiman

Bagging predictors

Machine Learning

(1996)

H. Cecotti et al.

Time delay neural network with fourier transform for multiple channel detection of steady-state visual evoked potentials for brain-computer interfaces

European Signal Processing Conference, Eusipco.

(2008)

H. Cecotti et al.

Convolutional neural networks for P300 detection with application to brain-computer interfaces

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2011)

L. Chen et al.

Health insurance and long-term care services for the disabled elderly in China: Based on CHARLS data

Risk Management and Healthcare Policy

(2020)

L. Deng et al.

Deep learning: Methods and applications

Foundations and Trends in Signal Processing

(2013)

I. Goodfellow et al.

Deep Learning

(2016)

K.-W. Ha et al.

Motor imagery EEG classification using capsule networks

Sensors (Switzerland)

(2019)

K. He et al.

Deep residual learning for image recognition

G. Hinton et al.

Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups

IEEE Signal Processing Magazine

(2012)

A. Hiraiwa et al.

EEG Topography Recognition by Neural Networks

IEEE Engineering in Medicine and Biology Magazine

(1990)

Hoffmann, U., Garcia, G., Vesin, J. M., Diserenst, K., & Ebrahimi, T. (2005). A boosting approach to P300 detection...

U. Hoffmann et al.

Application of the evidence framework to brain-computer interfaces

J. Jin et al.

Developing a Novel Tactile P300 brain-computer interface with a cheeks-stim paradigm

IEEE Transactions on Biomedical Engineering