1 Introduction

Electroencephalogram (EEG) signals play a pivotal role in medical diagnosis and cognitive neuroscience. Nowadays, many researchers have been attracted to EEG signals classification because each of these signals indicates a specific human physical behavior. The electroencephalogram is counted as an aperiodic time series neural field model and is formulated by summation of lots of neuronal membrane potentials. Notwithstanding the fact that various neuro-imaging techniques have been developed, EEG recording is a highly accepted method among the researchers and by Brain-Computer Interface (BCI), it is successfully employed in diagnosing the neurological diseases and understanding the psycho-physiological processes. Brain-Computer Interface (BCI) are systems that measure, process, and transform the brain activity patterns of a user, into messages or commands for in interactive application. For instance, using BCI, a user can imagine left or right hand movements and in this way move the cursor to the left or right of a computer screen. Consequently, EEG is used to diagnose mental problems such as sleeping disorders, epilepsy, Alzheimer’s, schizophrenia [1] In between various machine learning techniques have been investigated Which are mentioned in the continuation of this paper.

In this study, we aim at constructing a high performance EEG classification method using radial basis function neural network classifier. To this end, we first propose a new radial basis function classifier, then, we employ Locally Linear Embedding (LLE) [2] dimensionality reduction technique to increase the efficiency of our proposed EEG classification method.

Nowadays, deep learning based algorithms especially convolutional neural networks have have been successfully applied in different computer vision application [3]. Additionally, many research studies have been presented to the literature to improve the performance of RBFNN classifiers [4]. These proposed methods are mostly based on, particle swarm optimization (PSO), k-means algorithm, Fuzzy clustering, and other meta-heuristic algorithms such as harmony search. Authors in [5] extract ten features from EEG signals based on discrete wavelet transform (DWT) for epilepsy detection. They claim that, these numerous features will help the classifiers to achieve good performance. They used entropy, min, max, mean, median, standard deviation, variance, skewness, energy and relative wave energy (RWE) for feature extraction. In [6] a hybrid classification model using the Grasshopper Optimization Algorithm (GOA) and support vector machine (SVM) were proposed for automatic seizure detection in EEG is proposed which is called GOA-SVM approach. At first, a pre-process was done on EEG signals using Discrete Wavelet Transforms. Then ten features were extracted from the wavelet coefficients. In the next step, best features were selected. The parameters were optimized using the Grasshopper Optimization Algorithm. Finally, the extracted features were classified using a support vector machine (SVM). Authors in [7] proposed a hybrid Electroencephalogram classification approach based on grey wolf optimizer (GWO) and proposed an enhanced version of the support vector machine called GWO-SVM for automatic seizure detection. They used discrete wavelet transform to decompose EEG signals and to do feature extraction. Then, these features were used to train the SVM with radial basis function (RBF) kernel function. The reference [8] presents a comprehensive review on EEG signals classification. In [9] authors proposed a method using discrete wavelet transform—discrete cosine transform (DWT-DCT) based Bacterial Foraging Optimization (BFO) technique for watermarking of two-dimensional EEG data. Authors in [10] proposed a large-scale learning with stacked ensemble meta-classifier and deep learning-based feature fusion approach for COVID-19 classification. They extracted the features from the penultimate layer (global average pooling) of efficient net-based pre-trained models and reduced the dimensionality of the extracted features using kernel principal component analysis (PCA). In [11] a whale optimization-based neural synchronization has been proposed for the development of the key exchange protocol.

The aforementioned approaches propose new learning methods for specifying the centers of the Gaussian functions of the RBFNNs. However, some problems still exist. For instance, the proposed methods based on k-means clustering are highly sensitive to the center initialization. This flaw yields these methods to get stuck in a local minimum and consequently, the capability of the RBFNN classifier in learning the problem drops drastically. To alleviate this deficiency, substituting new clustering methods based on PSO with k-means clustering was proposed. Although this replacement brings more robustness to the center initialization step, it has been shown that PSO-based clustering is time consuming when the problem at hand is in a high dimensional and it does not converge to global minima. To tackle these problems, we propose using Jellyfish search (JS) [12] for selecting the centers of the Gaussian functions of a RBFNN classifier. There are some unique advantages with jellyfish algorithm over the traditional optimization techniques. This algorithm is simple, pliable, scalable, and balances discovery of the search space, leading to an optimal convergence. As a result, the high-performance regions of the solution space can be identified by the JS at a reasonable time reliably. So, if we specifying the centers of the Gaussian functions by the JS algorithm, the Gaussian functions can span the problem space more effectively. Thereby, the performance of the RBFNN classifier in learning the problem will be increased. Hence, our proposed JS based RBFNN classifier (JS-RBFNN) is applied for EEG signals recognition. However, as mentioned, before classification of EEG signals, Locally Linear Embedding (LLE) dimensionality reduction is utilized for reducing the dimensionality of EEG signals. Experimental results states that the proposed approach achieves higher performance compared to the existing state-of-the-art in the field.

The remainder of the paper is organized as follows. In Sect. 2, the proposed method for EEG signals classification is introduced. Experimental results are discussed in Sect. 3 and the paper is concluded in Sect. 4.

2 The proposed method for EEG signals classification

This section shows an outline of our proposed method for EEG signals classification. Figure 1 shows the chart of the proposed approach. In our study, we use DEAP dataset [13]. This dataset contains three groups of features: EEG signals, Physiological signals and Multimedia content analysis (MCA). As it is seen in Fig. 1, at first, a preprocessing is done on the raw data. As the outcome, some features are extracted from EEG signals. EEG features were extracted using 32 electrodes. Each electrode extracts some features in a specific power spectral. These features contain theta (4–8 HZ), slow alpha (8-10HZ), alpha (8-12HZ), beta (12-30HZ), and gamma (30 + HZ) Spectral power for each electrode. In DEAP dataset, there are 8064 features for each sample. Using locally linear embedding, the dimension is reduced to 4000. Then, the reduced features are randomly partitioned into training set and test set and they are used for training and testing our proposed JS-RBFNN classifier.

Fig.1
figure 1

Proposed approach for EEG signals classification

In the following subsections each step of the proposed EEG signals classification method is elaborated.

2.1 Dimensionality reduction using locally linear embedding (LLE)

In this study, we applied locally linear embedding to reduce the dimension of EEG signals. In the LLE algorithm, the dimensionality is reduced while the special relationships between data are kept [12]. By considering local symmetries of the data, the LLE reconstructs the original data that are in a high dimensional linearly. This is achieved by a mapping from a high dimension to a manifold in a lower dimension [12].

2.1.1 The LLE algorithm

Consider a dataset \(X = \left\{ {{\varvec{x}}_{1} ,{\varvec{x}}_{2} , \ldots ,{\varvec{x}}_{{\varvec{N}}} } \right\}\) in a d-dimensional space \(R^{d}\) to be reduced to a dataset \(Y = \left\{ {{\varvec{y}}_{1} ,{\varvec{y}}_{2} , \ldots ,{\varvec{y}}_{{\varvec{N}}} } \right\}\) in an l-dimensional space \(Y^{l}\). For preserving the spatial information between the original data, the k-Nearest-Neighbor graph (kNN) of \(Y\) should be locally similar to the kNN of \(X\). Suppose that \({\varvec{x}}\) is constructed of its k nearest neighbors \(X_{kNN} = \{ x_{j} |1 \le j \le k\}\) with weights \(w_{j}\), namely we have:

$${\varvec{x}} = \mathop \sum \limits_{j = 1}^{k} w_{j} {\varvec{x}}_{{\varvec{j}}}$$
(1)

then

$${\varvec{y}} = \mathop \sum \limits_{j = 1}^{k} w_{j} {\varvec{y}}_{{\varvec{j}}}$$
(2)

Now, we can define the LLE as followings:

  1. 1.

    A kNN graph \({\text{G}}_{{{\text{kNN}}}} \left( {\text{X}} \right)\) of \({\text{X}}\) with its k nearest neighbor \(X_{kNN} \left( {x_{i} } \right) = \{ {\varvec{x}}_{{{\Gamma }_{ij} }} |1 \le j \le k\}\) is constructed for each data point in \(X\),

  2. 2.

    A weight matrix \(W\) is computed by solving the following optimization problem:

    $$W = \mathop {{\text{argmin}}}\limits_{W} \sum\limits_{{i = 1}}^{N} {\left\| {x - \sum\limits_{{j = 1}}^{k} {W_{{i\Gamma _{{ij}} }} } x_{{\Gamma _{{ij}} }}^{2} } \right\|} ^{2}$$
    (3)
    $$\forall i,j,j \ne {\Gamma }_{ij} , W_{ij} = 0, \mathop \sum \limits_{j = 1}^{k} W_{{i{\Gamma }_{ij} }} = 1$$
    (4)
  3. 3.

    Map data to a new space Y, as followings:

    $$\begin{aligned}\varepsilon \left( Y \right)& = \sum\limits_{{j = 1}}^{k} {\left\| {y_{i} - W_{{ij}} y^{2} } \right\|}\\& = \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{N} M_{ij} y_{i}^{T} y_{i} \end{aligned}$$
    (5)

where, \({\varvec{M}} = ({\varvec{I}} - {\varvec{W}})^{{\text{T}}} \left( {{\varvec{I}} - {\varvec{W}}} \right)\) [2]. By computing eigenvalues of \({\varvec{M}}\), we can achieve new spaces with lower dimensions. Therefore, \(Y\) can be obtained as following:

$${\text{Y}} = [v_{2} ,v_{3} , \ldots ,v_{l + 1} ]^{T} ,$$
(6)

where, \(v_{2} ,v_{3} , \ldots ,v_{l + 1}\) are the second to \((l + 1)^{st}\) smallest eigenvalues of \({\varvec{M}}\).

2.2 Radial basis function neural network classifier

There are three layers in a radial basis function neural network classifier: an input layer, a hidden layer, and an output layer. The input data are fed to the hidden layer via the input layer. In the hidden layer, a mapping is done from a lower dimension to a higher dimension so that in a new space, the data can be separated by a line. This layer composed of k Gaussian functions. Finally, there will be m output neurons in the output layer. Each neuron corresponds to an output class. Consequently, the radial basis function neural network performs the transformation \(f: {\mathcal{R}}^{n} \to {\mathcal{R}}^{m}\) such that [14]:

$$y_{s} \left( P \right) = \sum\limits_{{j = 1}}^{k} {w_{{js}} } \varphi \left( {\frac{{\left\| {P - C_{j} } \right\|}}{{\sigma _{j} }}} \right),{\text{for}}1 \le s \le m,$$
(7)

where \(y_{s}\) is the sth network output,\(P\) is an input pattern, and \(w_{js}\) is the synaptic weight between jth hidden neuron and sth output neuron. Also, \(C_{j}\) and \(\sigma_{j}\) are the center and the width of the jth Gaussian function in the hidden layer, respectively. In Eq. (7), \(\varphi\) is the Gaussian function, defined as followings [14]:

$$\varphi \left( r \right) = e^{{ - r^{2} }} ,$$
(8)

The functionality and performance of a RBFNN classifier depends on three important factors, namely, the centers of the Gaussian functions, the widths of the Gaussian functions, and the training algorithm used for adjusting the synaptic weights. As it was already discussed, the Jellyfish search (JS) algorithm guarantees reaching to global minima. Hence, we propose a new method for learning the centers of the Gaussian functions using the JS algorithm.

2.3 Jellyfish search (JS)

In Fig. 2 the motions of jellyfish in the ocean have been shown. The rules of the Jellyfish meta-heuristic algorithm which have been stated in [12], are as follows:

  1. 1.

    There are two possible movements for Jellyfish at each time step: moving with ocean current or moving inside the crowd

  2. 2.

    Jellyfish are more interested in the area that there is plenty of food.

  3. 3.

    To measure how much food is available, the location and its corresponding fitness function are evaluated.

Fig. 2
figure 2

Jellyfish in the ocean [12]

2.3.1 Ocean current

The jellyfish are more attracted to the places that a considerable amount of food is available. These places (\(\overrightarrow {{\text{trend }}}\)) can be reached as follows [12]:

$${\text{trend}} = X^{*} \beta \times {\text{rand }}\left( {0 \, ,{1}} \right) \, \times \, \mu$$
(9)

where, \(X^{*}\) is the location of flow best jellyfish in the crowd; μ is the mean value of all jellyfish locations, \({\upbeta }\) is a distribution coefficient that determines the length of trend.

Accordingly, the location of \(i^{th}\) jellyfish at time t + 1, \(X_{i}\) (t + 1), is obtained as follows:

$$X_{i} ({\text{t}} + { 1}) \, = X_{i} \left( {\text{t}} \right) \, + {\text{ rand }}\left( {0 \, ,{1}} \right) \, \times \overrightarrow {{{\text{trend}}}}$$
(10)

2.3.2 Jellyfish swarm

A dense population of jellyfish is called a swarm. There are two types of movements in a swarm. Type A, passive movement, and Type B, active movement.

In type A, jellyfish move near their own locations. This kind of movement is described by the following equation [12]:

Type A motion is the motion of jellyfish around their own locations and the corresponding updated location of each jellyfish is given by:

$$X_{i} \left( {t + 1} \right) = X_{i} \left( t \right) + 0.1 \times rand \left( {0 ,1} \right) \times \left( {U_{b} - L_{b} } \right)$$
(11)

where, \(U_{b}\) and \(L_{b}\) are the upper bound and lower bound of the search space, respectively [12].

Equation (12) displays the movement of type B. Suppose that we want to update the position of the jellyfish i. To this end, we randomly pick a jellyfish, for instance, jellyfish j. Then the quantity of food that exists in these two places is compared. If the amount of food in the location j is more than the amount of food in the location i, the jellyfish i moves toward the jellyfish j. Additionally, if the amount of food existed in the location j is less than the amount of food existed in the location i, the jellyfish i moves away from the location j. These are done by the Eq. (13). As a result, the updated position of the jellyfish i is calculated by the Eq. (14).

$$\overrightarrow {{{\text{step}}}} \;{\text{is simulated as rand }}\left( {0,{1}} \right) \times {\text{is simulated as rand }}\left( {0,{1}} \right) \times \overrightarrow {{{\text{Direction}}}}$$
(12)
$$\overrightarrow {{{\text{Direction}}}} = \left\{ {\begin{array}{*{20}c} {X_{j} \left( t \right) - X_{i} \left( t \right)\quad if\; f\left( {X_{i} } \right) \ge f\left( {X_{j} } \right)} \\ {X_{i} \left( t \right) - X_{j} \left( t \right)\quad if\; f\left( {X_{i} } \right) < f\left( {X_{j} } \right)} \\ \end{array} } \right.$$
(13)
$$X_{i} \left( {t + 1} \right) = - X_{i} \left( t \right) + \overrightarrow {{{\text{step}}}}$$
(14)

At the very first iterations, type A movement is highly demanded while at the end of the search, type B movement has a higher impact on the final result [12].

2.3.3 Time control mechanism

The ocean current contains large amounts of nutritious food that attracts jellyfish [15]. Over time, more jellyfish gather into the ocean current and a jellyfish swarm is formed. As the temperature or wind changes the ocean current, the jellyfish in the swarm move into another ocean current, creating another jellyfish swarm. Jellyfish inside a jellyfish swarm exhibit type A and type B motion, between which the jellyfish switch. Type A is favored initially; as time goes by, type B is increasingly preferred. The time control mechanism is introduced to model this situation. To regulate the movement of jellyfish following the ocean current and inside the jellyfish swarm, the time control mechanism uses a time control function c(t) and a threshold constant \({c}_{0}\). The time control function is a random value that varies from 0 to 1 over time. Equation (15) formulates the time control function. When its value exceeds \({c}_{0}\), the jellyfish follow the ocean current. When its value is less than \({c}_{0}\), they move inside the jellyfish bloom. \({c}_{0}\) cannot be exactly known and the time control function varies randomly from zero to one. Hence, \({c}_{0}\) is set to 0.5, which is the mean of zero and one.

$${\text{c}}\left( {\text{t}} \right) = \left| {\left( {1 - \frac{t}{{MAX_{iter} }}} \right) \times \left( {2 \times rand\left( {1,0} \right) - 1} \right)} \right|$$
(15)

where t is the time specified as a number of iterations and \(MAX_{iter}\) is the maximum number of iterations, which is an initialized parameter. Like c(t), the function 1 − c(t) is used to simulate the motion of a jellyfish in a swarm (type A or B). When rand(0,1) exceeds (1 − c(t)), a jellyfish exhibits type A motion. When rand(0,1) is lower than 1 − c(t)), it exhibits type B motion. Since 1 − c(t) increases from zero to one over time, the probability that (rand(0,1) > 1 − c(t)) initially exceeds the probability that (1 − c(t) > rand (0,1)). Therefore, type A motion is favored over type B. As time passes, 1 − c(t) approaches one, and the probability that (1 − c(t) > rand (0,1)) ultimately exceeds (rand (0,1) > 1 − c(t)). Therefore, type B motion becomes more likely.

2.4 Proposed method for learning centers of the RBFNN classifier

This section elaborates our proposed learning method using the JS algorithm for selecting the centers of the Gaussian functions in the hidden layer of a radial basis function neural network classifier. This novel RBFNN classifier is called JS-RBFNN.

We want to cluster \(N\) samples into \(K\) groups in such a way that the samples within one group are very similar to each other while the samples from various groups are very different. This property can be achieved by introducing Eq. 19 [16].

At first, JS evolves a population of NP n-dimensional individual vectors. Then the quality of each solution is checked using the fitness function in Eq. 19.

In order to calculate the RBFNN centers using the JS algorithm, suppose that \(M\) represents a set of clusters centroids, namely, \(M = \left( {M_{1} ,M_{2} , \ldots ,M_{k} } \right)\).where, \(M_{j} = \left( {s_{j1} ,s_{j2} , \ldots ,s_{jl} , \ldots ,s_{jf} } \right)\) refers to the jth cluster centroid. To measure the similarity more efficiently, the Euclidean distance between each feature of the input pattern and the corresponding cluster centroid is computed by Eq. (16) as follows:

$$d\left( {M_{jl} ,P_{rl} } \right) = \sqrt {\mathop \sum \limits_{l = 1}^{f} \left( {S_{jl} - t_{rl} } \right)^{2} } for 1 \le j \le k, 1 \le r \le n, 1 \le l \le f$$
(16)

After calculation of all the distances for each vector, feature l of the pattern r is compared with the corresponding feature of the cluster j, and then 1 is assigned to \(Z_{jrl}\) when the Euclidean distance for each feature l of the pattern r is minimum [16]:

$$Z_{jrl} = \left\{ {\begin{array}{*{20}l} 1\quad if\; d\left( {M_{jl} ,P_{rl}}\right)\; is\; min \\ 0\quad elsewhere \\ \end{array} } \right.$$
(17)

In the next step, the mean of the data,\(N_{jl}\), is computed for each feature according to [16]:

$$N_{jl} = \frac{{\mathop \sum \nolimits_{r = 1}^{n} t_{rl} \times Z_{jrl} }}{{\mathop \sum \nolimits_{r = 1}^{n} Z_{jrl} }} for 1 \le j \le k, 1 \le l \le f$$
(18)

Then, for each feature l of the cluster j, the Euclidean distances between mean of data \(N_{jl}\) and the centroids \(S_{jl}\) are computed by [16]:

$$d\left( {N_{jl} ,S_{jl} } \right) = \sqrt {\left( {N_{jl} - S_{jl} } \right)^{2} } for 1 \le j \le k, 1 \le l \le f$$
(19)

Now, the fitness function for each cluster is obtained by summing the calculated distances as follows [16]:

$$F\left( {W,M,X} \right) = \mathop \sum \limits_{j = 1}^{K} \mathop \sum \limits_{i = 1}^{m} w_{ij} x_{i} - c_{j}^{2}$$
(20)

where, K is number of clusters, \(x_{i}\) is ith sample, \(c_{j}\) is center of the jth cluster, \({w}_{ij}\) is dependency of ith sample to jth cluster, \(X\) is data matrix, \(M\) is a matrix of the cluster's centers and \(W\) is matrix that shows membership of the samples to each cluster. If the ith sample is in the jth cluster, \({w}_{ij}\) will be 1, otherwise \({w}_{ij}\) will be zero.

$$w_{{{\text{ij}}}} = \left\{ {\begin{array}{*{20}c} {1\quad x_{i} \in Cluster_{j} } \\ {0\quad x_{i} \notin Cluster_{j} } \\ \end{array} } \right.\quad \forall i = 1,2, \ldots , N , j = 1,2, \ldots , K$$
(21)

After the centers of the Gaussian functions are determined, p-nearest neighbor algorithm is applied for width adjustment.

Pseudo-code of the proposed method for learning centers of the RBFNN is shown in Table 1.

Table 1 Pseudo-code of learning centers of the RBFNN

2.4.1 Learning algorithm for synaptic weights

The weights of our proposed neural network are determined by the gradient descent method as follows [17]:

$$\Delta W_{opt} = \lambda_{opt} \Delta W = \frac{{\left( {E\phi } \right)(E\phi )^{T} \left( {E\phi } \right)}}{{\left( {E\phi \phi^{T} } \right)(E\phi \phi^{T} )^{T} }}$$
(22)

Hence

$$W_{new} = W_{old} + \frac{{\left( {E\phi } \right)(E\phi )^{T} \left( {E\phi } \right)}}{{\left( {E\phi \phi^{T} } \right)(E\phi \phi^{T} )^{T} }}$$
(23)

which \(W\) is initializing randomly.

3 Experimental results and comparisons

As it was stated, two novelties have been proposed in this work. First, a new learning algorithm has been investigated for determining the centers of the Gaussian functions of the RBFNN classifier. Second, locally linear embedding is employed for feature reduction in EEG signals classification. In this paper, in order to measure the performance of the proposed methods, confusion matrices have been reported. In our experiments, we first measure the performance of the proposed RBFNN classifier on the Proben1 dataset which is a popular benchmark dataset in pattern recognition. Next, by doing various experiments, we evaluate the performance of our proposed method for solving EEG signals classification.

3.1 Experimental results on the Proben1

The Proben1 contains 15 datasets in 12 various world's real problems. It is used for classification and function approximation. In this paper, we use five benchmark problems from Proben1 dataset as they have already been used by other researchers [18]. A description of these datasets can be seen in Table 2. In this table number of features, number of classes and number of patterns of each dataset have been shown by No. of features, No. of classes and No. of patterns fields, respectively.

Table 2 A short description of the Proben1 dataset

Table 2 shows parameters and obtained results of testing our proposed JS-RBFNN classifier. In all the experiments of this study, the number of hidden neurons (HN) of the JS-RBFNN has been determined as 30% of the size of the training set. In order to avoid overfitting, each experiment has been repeated five times and the average of the accuracies has been reported as the final result. Also, it should be noted that in all the experiments, we used 70% of samples for training and the rest for test. Table 3 states that the JS-RBFNN can reach reliable performances. This high performance is because of the high ability of the jellyfish search algorithm in selecting the proper centers of the Gaussian functions of the hidden layer of the RBFNN.

Table 3 The classification performance (Accuracy) of the JS-RBFNN classifier on Proben1 dataset

In Table 3, A-RBFNN [18] is a new RBFNN classifier based on Atanassov’s Intuitionistic Fuzzy Set Theory. FCM-RBFNN [22] employs fuzzy c-mean clustering method for determining the centers of the Gaussian functions of the hidden layer. PSO-RBFNN [16] applies a new kind of clustering method based on particle swarm optimization (PSO) for selecting the centers of the Gaussian functions. MLP-HS [19] is a new multi-layer perceptron classifier trained by harmony search. Bayes classifier is a kind of non-parametric Bayesian classifier (Soria et al. 2011), and ILVQ proposed in [20] is a new version of learning vector quantization classifier.

Table 3 relates that the performance of the JS-RBFNN is superior to other classifiers on the Glass, Ionosphere, and Iris datasets. However, on the Wine and Abalone datasets, the A-RBFNN overcomes our proposed JS-RBFNN. As seen in Table 3, the performance of our proposed classifier is nearly 2.2% higher than A-RBFNN classifier. One of the advantages of our method over other methods is that the efficient number of hidden neurons can be easily found. In our proposed method the number of hidden neurons is set to 30% of the samples of the dataset. Other advantage of our method is its accurate clustering.

3.2 Experimental results on DEAP dataset

The DEAP dataset includes two parts [13]. The first part is the ratings from 120 one-minute music videos that have been rated by 14–16 participants based on arousal, valence and dominance. In the second part, 32 participants attended to rate the 40 music videos out of 120 music videos. From these experiments, EEG and physiological signals were recorded. Additionally, face video of 22 participants were captured. In this study we used valence and arousal as shown in Table 4. It is necessary to mention that valence and arousal were subdivided into 4 quadrants, namely, low arousal/low valence (LALV), low arousal/high valence (LAHV), high arousal/low valence (HALV) and high arousal/high valence (HAHV). In this paper, the patterns have been classified using these quadrants.

Table 4 DEAP dataset representation for each subject [13]

Table 5 reports the parameters and obtained results of testing our proposed JS-RBFNN for EEG signals classification while no feature selection or dimensionality reduction has been applied. Moreover, in all of the experiments the amounts of training data and test data have been chosen 50% and 50%, respectively. The number of the train data and test data has been chosen 50% and 50%, respectively because our goal was to challenge HS-RBF when we have a train data with a relatively small size. Therefore, we set the number of test data equal to 50% of dataset. It is necessary to mention that in order to learn and test various classes, some data of each class has been selected randomly.

Table 5 The parameters and classification performance of the JS-RBFNN classifier on DEAP dataset

Based on Table 5, using the JS-RBFNN classifier on the test set of Valence labels, the accuracy is 62.1%, and on the test set of Arousal labels, the accuracy is 62.3%.

Table 6 illustrates the result of EEG signals classification using the JS-RBFNN classifier while locally linear embedding has been adopted for dimensionality reduction of the original DEAP dataset. Figures 3, 4 show confusion matrices of JS-RBFNN on the original datasets of Valence and Arousal. In these figures, the classes 1, 2, 3, and 4 represent LALV, LAHV HALV and HAHV, respectively.

Table 6 The parameters and classification performance of the JS-RBFNN classifier on DEAP dataset while LLE was applied for dimension reduction
Fig. 3
figure 3

Confusion matrix of JS-RBFNN on original data of Valence

Fig. 4
figure 4

Confusion matrix of JS-RBFNN on original data of Arousal

As Fig. 3 states, for classifying Valence data, the class LALV is mostly confused with classes the HAHV and LAHV. The class LAHV is mostly confused with the classes HALV and HAHV. Also, the class HALV is confused with the classes LAHV and LALV. Finally, the class HAHV is highly confused with the classes LAHV and HALV.

Based on the Fig. 4, for classifying Arousal data, the class LALV is mostly confused with classes the HAHV and HALV. The class LAHV is mostly confused with the classes LALV and HAHV. Also, the class HALV is confused with the classes LALV and LAHV. Also, the class HAHV is highly confused with the classes HALV and LALV.

As Table 6 states, the classification performance of the JS-RBFNN has been notably increased after reducting input space from 8064-dimensional to 4000-dimensional. If train data includes some redundant information, it causes some problem for a classifier to learn it. Consequently, we decided to use LLE dimensionality reduction technique to overcome this deficiency. Hence, by using finer train data we can hope to reach a higher performance as it is shown in Tables. 5, 6. Figures 5, 6 show confusion matrices of JS-RBFNN while dimensionality reduction has been applied on the Valence and Arousal datasets. In these figures, the classes 1, 2, 3, and 4 represent LALV, LAHV HALV and HAHV, respectively.

Fig. 5
figure 5

Confusion matrix of JS-RBFNN on the Valence dataset using LLE dimensionality reduction

Fig. 6
figure 6

Confusion matrix of JS-RBFNN on the Arousal dataset using LLE dimensionality reduction

After using the LLE and the proposed JS-RBFNN, it is seen that the confusion rates between different classes are highly decreased.

As can be seen from the above tables and figures, the proposed method is able to increase the accuracy in the EEG signals classification by having two innovations in its body. Therefore, using the jellyfish meta-heuristic algorithm in order to find the best Gaussian function centers in the radial basis function neural network and using the locally liner embedding algorithm in order to dimensionality reduction of EEG signals, lead to increase the accuracy in the EEG signals classification.

3.3 Comparison of EEG signals classification

In this section, we compare the result of our proposed EEG signals classification method with other methods. Table 7 reports the experimental results of various state-of-the-art methods of EEG signals classification on the DEAP dataset. This table shows that the proposed approach outperforms several existing approaches with relatively high margin. It should be considered that we have used 300 hidden neurons and 5000 epochs for training our proposed JS-RBFNN classifier. DEAP dataset contains 1000 samples. Since 30% of data was used to determine the number of the hidden neurons, we set it to 300.

Table 7 Classification accuracy comparisons of different methods on the DEAP dataset

From the results, considering the overall EEG signals classification performance, our proposed method outperforms other methods. Of the methods used for comparison, DLN model [26], DNN model [13], CNN model [13], and H-AVE-BGRU [28] are based on deep learning techniques. Apparently, these approaches are premiere to approaches that are based on the classical machine learning methods such as Bayesian classifier proposed in [83 and 84] and 1-NN classifier proposed in [25]. H-AVE-BGRU is a hierarchical bidirectional Gated Recurrent Unit (GRU) network [28]. Of the methods based on the classical machine learning, the best results achieved in fNIRS model by the proposed method in [27] are 73.9% and 66% for Valence and Arousal classification, respectively. This result is inferior to some of the deep learning-based methods. As Table 6 states, in Valence and Arousal classification, our proposed method reaches to 85.2% and 78.2% accuracies, respectively. The performance of our proposed classifier is nearly 3.8% higher than CNN-model classifier on Valence and nearly 4.9% higher than CNN-model on Arousal. The methods proposed by Tripathi et al. [13] produces the second and the third best results with 81.4% and 73.3% accuracies for Valence and Arousal classification, respectively. Consequently, as seen, we can achieve higher performance with a simpler neural network compared with other state-of-the-art in the field.

4 Conclusion

In this paper, a new approach for EEG signals classification has been proposed on DEAP dataset. There are two novelties. First, a new radial basis function neural network classifier was proposed. Second, the locally linear embedding (LLE) dimensionality reduction technique was applied to decrease the size of the feature vectors of EEG signals. The proposed classifier uses the Jellyfish Search algorithm to determine the centers of the Gaussian function of the RBFNN classifier. The locally linear embedding dimensionality reduction reduces the dimensionality of the feature vectors and increases the recognition accuracy. For evaluating the performance of the proposed method, various experiments on different datasets were investigated. The experimental results convey that our proposed method overcomes several existing state-of-the-art methods in EEG signals classification. Future work includes investigating new learning methods for determining the synaptic weights of the RBFNN classifier. Also, proposing enhanced versions of other meta-heuristic algorithms applicable for center selection of the RBFNN classifier can be considered for future work.