P300 event-related potential detection using one-dimensional convolutional capsule networks
Introduction
Recently, with the rapid development of artificial intelligence technology, research on human-computer intelligent interactions has become a focus in the field of ergonomics. Scholars have attempted to use interactive technologies such as speech recognition (Hinton et al., 2012), eye tracking (Black et al., 2018, Kim et al., 2019), gesture control (Morganti et al., 2012), and brain signals (Baloglu and Yildirim, 2019, Yıldırım et al., 2020) to improve and create new intelligent interactive experiences.
BCI is a direct communication pathway between the human brain and a computer that requires no physical interaction. It is also one a cutting-edge and highly challenging type of interactive technology (Allison et al., 2007, Kostov and Polak, 2000). BCI can acquire, decode, and recognize brain signals (Schalk et al., 2004) and use the results to make decisions (Shih et al., 2012), which allows BCI to be used to help people with severe motor disabilities such as spinal cord injuries or amyotrophic lateral sclerosis (ALS) (Chen et al., 2020) interact more effectively with computers and other smart devices (Nicolas-Alonso & Gomez-Gil, 2012). The core aspects of a BCI system are usually divided into four parts: recording signals generated by the user's brain, signal pre-processing, feature extraction, and brain signal classification. Because EEG collection is non-invasive and can be acquired using relatively inexpensive equipment, EEG signals have become the common approach to building BCI systems. The EEG classification strategy depends on the stimulus itself, and mainly includes the following signal types: ERPs (Birbaumer et al., 1999; Jin, Chen, et al., 2020), steady-state evoked potential (SSEP) (Müller-Putz et al., 2006), motor imagery (MI) (Ha and Jeong, 2019, Jin et al., 2020, Jin et al., 2020, Jin et al., 2020) or slow cortical potential (SCP) (Pfurtscheller et al., 1997). The differences among these signal types mean that EEG signals require specific feature extraction methods and classification algorithms to achieve accurate classification.
Although neuroscience has provided knowledge and guidance on EEG detection and signal processing, machine learning algorithms allow feature extraction and modelling the signal variability over time and over subjects (Müller et al., 2008). Therefore, machine learning algorithms are widely used in EEG signal classification. Classical machine learning algorithms such as linear discriminant analysis (LDA) (Jin, Li, et al., 2020), naive Bayes (NB) (Lotte et al., 2018), support vector machine (SVM) (Rakotomamonjy & Guigue, 2008), hidden Markov model (HMM) (Obermaier et al., 2001, Zhong and Ghosh, 2002), and neural networks (NNs) (Cecotti & Gräser, 2008) have achieved various levels of success in EEG classification. Some scholars have applied the ensemble-based modeling order mixture and evolutionary-based order fusion methods in BCI recognition, and the effect is better (Atyabi et al., 2016). However, the accuracy of BCI systems based on P300 ERPs is unsatisfactory, and there is still room for improvement. Scholars first applied backpropagation neural networks to EEG pattern recognition (Hiraiwa et al., 1990), demonstrating that deep learning algorithms can be applied to EEG classification and BCI pattern recognition. Since then, various deep learning methods have been tested for EEG recognition and some have achieved good results (Thomas et al., 2017). In the most classic example, Cecotti H et al., introduced CNNs into BCI to detect P300 ERPs. The author proposed seven CNN-based classifiers and evaluated their performances and network topology; the final classification result achieved excellent results. The character recognition rate reached as high as 95.5%, outperforming the recognition rates of traditional machine learning methods (Cecotti & Gräser, 2011). Because recurrent neural network (RNN) models have achieved good results on sequence information recognition tasks (such as speech recognition) (Lipton et al., 2015, Yao et al., 2020), the long- and short-term memory (LSTM) network and gated recurrent unit (GRU) have also been applied to EEG recognition. However, the classification results seem to be similar to those of a CNN, and the RNN models require longer times for training and testing, and their real-time performance levels are relatively poor (Joshi et al., 2018).
CNNs have achieved great successes in computer vision in recent years, but they also have some limitations (Sabour et al., 2017). First, if the test data are distorted (tilted, rotated, etc.), the CNN classification results will be adversely affected. Second, the purpose of pooling operations in CNNs is to establish position invariance rather than equivalence. Even when this approach works well, it is a disaster for the data itself. On the other hand, CNNs learn limited spatial information by expanding the pooling field of view without considering the core spatial relationships between data objects; thus, they tend to lose the spatial positioning of different components in the data, which also leads to a decline in CNN classification performance. To solve these problems, Hinton et al. proposed a new type of deep neural network architecture in the paper “Dynamic Routing Between Capsules” published at the end of 2017, called Capsule Networks (CapsNet) (Sabour et al., 2017). CapsNet uses the concept of capsules to automatically learn various object features, and it also considers the core spatial relationships between objects to retain the component spaces they occupy. CapsNet achieved 55% and 98.5% accuracy rates when classifying the SVHN and MNIST datasets, respectively, outperforming the previous best CNNs, and making it the best unsupervised classification result to date (Kosiorek et al., 2019). CapsNet's excellent performance has led many scholars to try to apply the CapsNet model to other fields. In the BCI field, EEG signal data typically have a low signal-to-noise ratio (SNR). A large number of inconsistent and unstable interference signals are generated by electro-oculograms (EOGs), electromyograms (EMGs), power frequency, and other types of interference. In addition, The P300 ERP signals are usually submerged in the EEG, making them difficult to distinguish directly from the raw signal manually. Therefore, BCI based on P300 ERP detection has many opportunities to benefit from CapsNet (Ha & Jeong, 2019).
In this paper, P300 event-related potential EEG detection was researched and analyzed as follows. (i) According to the experimental paradigm and requirements, the original EEG data is organized into training sets and tests sets through a preprocessing that mainly includes data cleaning, data subsampling, and data normalization, to provide data support for the proposal and verification of the algorithm. (ii) On this basis, we introduced the dynamic routing between capsules theory. The hyperparameters of the model was analyzed and reconstructed through the improved topology, then two classifiers 1D-CapsNet-64 and 1D-CapsNet-8 based on the 1D-CapsNet model were proposed that classifies the EEG data of 64 and 8 electrodes respectively. (iii) The classification results were applied to character prediction for observation of the accuracy of the classifier, character prediction rate, and information transmission rate, and other indicators. (vi) The method proposed in this article were compared with other advanced machine learning algorithms, the proposed method was proved to be feasible. The specific process of P300 signal detection is shown in Fig. 1.
Our contributions in this article are as follows:
- (1)
A P300 ERP detection method based on the CapsNet model is proposed for the first time and used for character recognition in the P300 speller.
- (2)
Although based on the existing CapsNet model, the CapsNet model used here is improved by adding a one-dimensional convolution so that it can decode the P300 ERP signal in the EEG. The resulting character recognition rate can reach 98%. The proposed network topology is studied in detail, and the most universal network topology is selected as the proposed model.
- (3)
Two classifiers for P300 ERP signal detection based on 1D-CapsNet are proposed, making our proposed model more practical for BCI implementations. These two classifiers are used to classify EEG data with 64 electrodes and 8 electrodes, and their performances are evaluated.
The remainder of this paper is organized as follows: Section 2 introduces the data set used in this article and reviews previous related research. Section 3 describes the proposed method in detail. Section 4 reports the experimental results and provides an analysis of the network topology. Finally, a detailed discussion and conclusions are presented in sections 5 and 6, respectively.
Section snippets
Experiments and dataset
The P300 wave is one of the main components of ERPs, which are mainly obtained from an EEG signal. The P300 wave is a positive deflection of the voltage that occur approximately 300 ms after the brain is stimulated, such as by a flash. In general, the amplitude of the P300 ERP signal is highest near the parietal lobe and occipital lobe (PZ electrode). It is difficult to find P300 ERP signals and extract their features directly from raw EEG signals without data processing. Although the detection
Original CapsNet
The main difference between the CapsNet and a traditional CNN is that a CNN creates a deep network through continuous convolutional layers. In contrast, CapsNet embeds neurons that focus on the same category or attribute into a capsule, which is a group of neurons. The length of the activity vector of the capsule represents the probability of the existence of the entity, and the direction of the activity vector represents the instantiation parameter. This allows the capsule network to perceive
Parameter optimization and classifier selection
For the training and parameter optimization of 1D-CapsNet, we conducted experiments using a PC workstation equipped with an NVIDIA GeForce GTX 1070 GPU, an AMD Ryzen 7 1800X CPU, and 16 GB of RAM. The entire algorithm was implemented using the Python Keras neural network library.
In each 50 epochs of training, the Adam optimization method, which has a fast convergence speed and a good optimization effect is used to update the parameters. By default, the learning rate is 0.001, , ,
Discussion
In this paper, the feasibility of the CapsNet model in EEG detection is demonstrated via a large number of experiments. The original CapsNet model is improved by one-dimensional convolution and its network structure is modified to make it more suitable for P300 ERP detection. Compared with other methods based on machine learning and deep learning regarding performance and effectiveness, the experimental results in Section 4 show that the detection methods based on the 1D-CapsNet model
Conclusion
The study proposed a 1D-CapsNet method for P300 ERPs detection based on the “Dynamic Routing Between Capsule” theory. This model combined the idea of one-dimensional convolution with the traditional Caps Net model to make it more suitable for EEG signal detection. Concretely, to make 1D-CapsNet more practical for BCI application, the proposed method by applying 1D-CapsNet-64 and 1D-CapsNet-8 classifiers for the detection of P300 ERPs with different number of electrodes, and compared with other
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
CRediT authorship contribution statement
Xiang Liu: Conceptualization, Writing - original draft, Methodology, Formal analysis, Software. Qingsheng Xie: Supervision, Conceptualization. Jian Lv: Software, Writing - review & editing. Haisong Huang: Visualization, Methodology. Weixing Wang: Data curation, Visualization.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China grant 52065010, 51865004. the Science and Technology Top Talent Support Program Project of Guizhou Province grant KY[2018]037, and in part by the Department of Education Project of Guizhou Province under Grant YJSCXJH[2019]108).
References (52)
- et al.
Mixture of autoregressive modeling orders and its implication on single trial EEG classification
Expert Systems with Applications
(2016) - et al.
Talking off the top of your head : Toward a mental prosthesis utilizing event-related brain potentials
Electroencephalography and Clinical Neurophysiology
(1988) - et al.
Deep learning based on Batch Normalization for P300 signal detection
Neurocomputing
(2018) - et al.
A smart watch with embedded sensors to recognize objects, grasps and forearm gestures
Procedia Engineering
(2012) - et al.
Machine learning for real-time single-trial EEG-analysis: From brain-computer interfacing to mental state monitoring
Journal of Neuroscience Methods
(2008) - et al.
Hidden Markov models for online classification of single trial EEG data
Pattern Recognition Letters
(2001) - et al.
EEG-based discrimination between imagination of right and left hand movement
Electroencephalography and Clinical Neurophysiology
(1997) - et al.
Brain-computer interfaces in medicine
Mayo Clinic Proceedings
(2012) - et al.
Brain-computer interface systems: Progress and prospects
Expert Review of Medical Devices
(2007) - et al.
Convolutional long-short term memory networks model for long duration EEG signal classification
Journal of Mechanics in Medicine and Biology
(2019)
A spelling device for the paralysed
Nature
Auditory display as feedback for a novel eye-tracking system for sterile operating room interaction
International Journal of Computer Assisted Radiology and Surgery
The BCI competition III: Validating alternative approaches to actual BCI problems
IEEE Transactions on Neural Systems and Rehabilitation Engineering
Bagging predictors
Machine Learning
Time delay neural network with fourier transform for multiple channel detection of steady-state visual evoked potentials for brain-computer interfaces
European Signal Processing Conference, Eusipco.
Convolutional neural networks for P300 detection with application to brain-computer interfaces
IEEE Transactions on Pattern Analysis and Machine Intelligence
Health insurance and long-term care services for the disabled elderly in China: Based on CHARLS data
Risk Management and Healthcare Policy
Deep learning: Methods and applications
Foundations and Trends in Signal Processing
Deep Learning
Motor imagery EEG classification using capsule networks
Sensors (Switzerland)
Deep residual learning for image recognition
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
IEEE Signal Processing Magazine
EEG Topography Recognition by Neural Networks
IEEE Engineering in Medicine and Biology Magazine
Application of the evidence framework to brain-computer interfaces
Developing a Novel Tactile P300 brain-computer interface with a cheeks-stim paradigm
IEEE Transactions on Biomedical Engineering
Cited by (16)
A novel multiclass-based framework for P300 detection in BCI matrix speller: Temporal EEG patterns of non-target trials vary based on their position to previous target stimuli
2023, Engineering Applications of Artificial IntelligenceA novel approach for detection of consciousness level in comatose patients from EEG signals with 1-D convolutional neural network
2022, Biocybernetics and Biomedical EngineeringCitation Excerpt :Artifacts were removed using a 50 Hz infinite impulse response notch filter and a 30 Hz finite impulse response (Hamming window-based, 128-order) lowpass filter before further analysis. Recently, in many studies in the literature, multi-channel EEG data has been applied to CNN as a new input signal separately as seen in Ref. [55–60]. Two different data sets were created for the pre-processed EEG signals in this study.
Classification of ERP signal from amnestic mild cognitive impairment with type 2 diabetes mellitus using single-scale multi-input convolution neural network
2021, Journal of Neuroscience MethodsCitation Excerpt :It is one of the representative algorithms of deep learning, convolutional neural network. It is often used in the fields of computer vision, medical imaging, natural language processing, etc (Li et al., 2008), and it is widely used in the field of EEG signal classification (Long and Singh, 2013; Y. Liu et al., 2021; Liu et al., 2021a; Jie et al., 2007; Opałka et al., 2018). This study proposes a multispectral image classification method for different spectral features of ERP signals single-scale multiple input convolutional neural network (SSMICNN), whose structure is shown in Fig. 2.
Enhancing P300 based character recognition performance using a combination of ensemble classifiers and a fuzzy fusion method
2021, Journal of Neuroscience MethodsCitation Excerpt :Kundu and Ari (Kundu and Ari, 2019a) proposed a sparse autoencoder (SAE) and stacked sparse autoencoder (SSAE) based on deep feature learning techniques to describe EEG signals. Kundu and Ari (2019a) introduced a multiscale CNN (MsCNN) model to extract multi-resolution deep features from the data; Kshirsagar and Londhe (2018) developed an efficient deeplearning model including stacked autoencoder (SAE) and deep convolution neural network (DCNN) to improve the performance of the existing Devanagariscript (DS)-based P300 speller in less number of trials, and his group (Kshirsagar and Londhe, 2019) also proposed an efficient single-trial character detection method for devanagari-script-input-based P300 speller (DS-P3S) using weighted ensemble of deep convolution neural networks (WE-DCNNs) to reduce false detection rate; Liu et al. (2021) presented a machine learning model based on a one-dimensional convolutional capsule network (1D-CapsNet) to extract features in the time domain. Although deep learning based featureless methods has been successfully used in P300 BCI system, some works are time-consuming in training and testing phase, and it may require multiply parameters in the training process, which influences the real-time performance levels of BCI system.
Underwater bolted flange looseness detection using percussion-induced sound and Feature-reduced Multi-ROCKET model
2024, Structural Health Monitoring