Keywords

1 Introduction

Epilepsy is one of the most serious and frequently occurring neuropathological condition affecting around fifty million people globallyFootnote 1. Epileptic seizures can be lethal [1]. Epileptogenesis is a long term dynamic progressing process of hyperexcitability and abnormal synchronization of brain neurons until the manifestation of seizure. Available drugs treat epileptogenesis indirectly by suppressing ictogenesis (the expression of seizures) [2]. Treatment of epileptic patients is mostly symptomatic based on clinical features.

The “epileptic network”, can be defined as a distributed network of distinct and distant brain regions causing hyperexcitability and hypersynchrony in a case of epilepsy [3]. Here we discuss the EEG data based computationally derived epileptic networks. Analysis of the epileptic networks may help in localization of a brain-region based signature in epileptic patients [4]. Till date, we have a limited idea of epileptogenesis. Epilepsy does not have a cure yet. Moreover, there does not exist any signature pattern of brain region based connections that can successfully categorize epileptic patients from normal human beings. The case becomes more complex when we try to incorporate demographic features like gender or age to EEG data of epileptic patients.

In this article, we differentiate epileptic patients from normal individuals based on brain-region connection based signatures derived from EEG data. Moreover, we try to answer a few questions like how different is male epilepsy from female epilepsy? Do they over-represent different connections among various brain regions? How child epilepsy differs from teenage and adult epilepsy? Do we find different patterns of over/under-representation of connections among various brain regions for different demographic categories? Can they be used as signature patterns clinically? In addition, we have aided our results with findings by a fuzzy rule mining based approach. Each discovered rule is a different combination of presence/absence of brain region based connections.

2 Data

The dataset consists of electroencephalography (EEG) data collected from 60 healthy individuals and 80 patients suffering from epilepsy. The data has also been grouped and studied according to gender and age. We have data of 31 normal males and 29 normal females in our dataset. Likewise, we have data of 43 epileptic males and 37 epileptic females in our dataset. Patients with less than 13 years of age have been defined as children. We have data of 23 normal and 41 epileptic children in our dataset. Teenagers have an age range of 13–19. We have found 14 such normal and 22 epileptic cases in our data. Patients with more than 19 years of age have been defined as adult epileptic patients. We have 23 normal and 17 epilepsy adult patients.

3 Data Acquisition and Filtering

Data collection has been done with a computerized EEG machine (16 channels Recorders & Medicare Systems Pvt. Ltd. (RMS)). It has been used to record the EEG for an interval of 20–30 min. The data of each epileptic patient has been split into epochs of 10 seconds interval. The internationally accepted Modified Combinatorial Nomenclature (MCN) system accepted scheme for the location of electrodes has been followed. According to this scheme, each location is denoted by a combination of letter(s) and number. The letter(s) are used to identify the position of the electrodes on the brain lobes, whereas the numbers denote the hemispherical regions on the brain. The frontal polar, frontal, temporal, parietal and occipital lobes are represented by the letters ‘FP’, ‘F’, ‘T’, ‘P’, and ‘O’ respectively, whereas odd and even numbers stand for electrode position on the left and right hemisphere respectively. The letter ‘C’ is used for identification only. During the collection of data, it has been asked to all participants to stay awake and motionless with wide open eyes. Subsequently, they have been requested to attain a no-thinking state as far as possible. Each data has been recorded using a series of activation procedures, i.e., eye blinking, photic stimulation, and hyperventilation among others.

After the recording of EEG data, firstly, the noise has been removed manually by experienced neurotechnologists. Then we have used EEGLAB toolbox version 13 [5], implemented in MATLAB R2015a for further data filtering purpose. The EEGLAB plugin, called CleanLine, has been used to remove sinusoidal noise from raw EEG data. The resultant data have been filtered again using Finite Impulse Response (FIR) filter, within the range of 4–60 Hz to remove sleep waves and noise due to electrical circuits. Here, we have used Independent Component Analysis (ICA) by applying the Runica algorithm [6]. Thus, a multivariate signal is decomposed into its additive independent non-Gaussian components. We have separated the maximum likely components from a number of noisy components using neural networks. Lastly, a final manual check has been done to assure that artifacts from the data have been removed.

4 Methodology

We have divided the methodology into four different steps. The steps have helped in selecting significant features and in finding important rules for epileptic patients in general and for different demographic categories. Figure 1 depicts the flowchart of our methodology pipeline.

Fig. 1.
figure 1

Flow chart of the methodology.

  1. 1.

    Brain connectivity network generation: We have calculated the correlation between two certain electrode positions using Pearson’s correlation coefficient [7]. We have considered the 16 electrode positions as vertices. If two vertices are found to be correlated (positively or negatively), we have created an edge between them. Thus, we have developed the brain connectivity network of normal individuals and epileptic patients. The detailed explanation of this step can be found in one of our previous research work [8].

  2. 2.

    Determination of maximum features: We have developed a 16\(\times \)16 adjacency matrix for each brain connectivity network representing either a healthy volunteer or an epileptic patient. This adjacency matrix is symmetric with ‘0’s in its diagonal elements. Thus, maximum \((\frac{16\times 15}{2}) = 120\) unique undirected connections are possible from such a matrix. These 120 connections have been considered as features.

  3. 3.

    Feature selection and classification: We have identified the key features discriminating normal individuals from epileptic patients (a two class problem), male epileptic patients from female epileptic patients (another two class problem) and child epileptic patients from teenage as well as adult epileptic patients (a three class problem). We have used ten well-established feature selection algorithms, i.e., mRMR [9], Fisher scoring [10], t-test, Gini index, reliefF, Support Vector Machine (SVM) [11], gain Ratio, Chisquare, fuzzy entropy measures with similarity classifier [12], and symmetrical uncertainty based approaches [13,14,15] among others to identify top twenty most significant features and last twenty least significant features for the above mentioned classification problems. We have selected those most/least significant features whose frequency of occurrence over the ten algorithms is more than 60%. We have used eight well-known classification algorithms, i.e., radial basis function neural network [16], random forest [17], SVM [11], multilayer perceptron, logistic regression, Bayesian logistic regression, rotation forest [18], and regression method among others to classify the combination of the minimum number of most and least significant features. It has been done to maximize the average classification accuracy of the most significant features and minimize the same for the least significant features. Thus, a combination of minimum features has been selected for each of the classification problems mentioned earlier.

  4. 4.

    Fuzzy rule mining: We have used Fuzzy Unordered Rule Induction Algorithm (FURIA) [19] to identify a few rules for differentiation of the aforementioned classes. Here, we have considered all 120 features as input to FURIA. Besides, we have calculated the Certainty Factor (CF) for each rule. CF lies in a range of [-1, 1]. If the antecedent and consequent are related, the value of CF becomes positive. A higher value of CF represents more significant rule.

5 Results and Discussion

We have found certain brain-region based connections (features) over represented and a few under represented while comparing epileptic patients with normal volunteers. C3-F3, F7-O1, F7-O2, T7-F7, F8-FP1, P8-P4 and P8-P7 features have been found to be over represented in epilepsy. On the other hand, P7-P4 and T8-F3 features have been found to be under represented. The combination of these nine significant features (Table 1) has shown 82.59% average classification accuracy with 20 fold cross validation to separate normal individuals from epileptic patients in general. The representation pattern of these features constitutes a generalized epileptic signature.

In a similar way, we have found a few over represented and under represented features while comparing male and female epileptic patients. P4-F4 and P8-C4 features have been found to be over represented along with under representation of O2-C3, O2-P3, P7-O2 and F8-F7 features in case of female epileptic patients. Here, we have found 70.94% average classification accuracy with 20 fold cross validation for a combination of these six significant features (Table 1) for identifying male and female epileptic patients separately. Over or under representation pattern of these features constitute the gender-specific epileptic signatures.

On the other hand, the combination of eight promising good features (Table 1) has shown 62.14% average classification accuracy to discriminate child epileptic patients from teenage along with adult epileptic patients. The F8-F3 feature has been found to be over represented along with two under represented P8-FP1 and P8-T8 features in case of child epilepsy. In the cases of teenage epilepsy we have found an over represented O2-F3 feature and under represented F7-P4 feature. P8-T8 and O2-F4 features have been found to be over represented in the case of adult epilepsy. The representation pattern of these features constitutes the age-specific epileptic signatures. The rest of the features seem inconclusive at this point according to their frequency of occurrence. However, the average classification accuracy value may increase with a larger sample size.

Table 1. List of significant features differentiating epilepsy from normal individuals; male from female epilepsy; and child from teenage and adult epilepsy. Odd numbers succeeding node names indicate left hemisphere of the brain and similarly, even numbers indicate right hemisphere of the brain. The frequency of occurrence is given in percentage.
Table 2. Fuzzy rules with its certainty factor (CF) for differentiating epilepsy from normal individuals; male from female epilepsy; and child from teenage and adult epilepsy. Range of CF is [-1,1].

In support of the aforementioned results, a fuzzy rule based association mining study has been done with the all possible 120 features (connections among different nodes) as given in Table 2 to find unique rules. We have performed this study to generate support for our earlier findings. Moreover, this study has provided additional associated features. The additional features helped in defining a proper rule for the general, age-wise and gender-wise epileptic signatures. Some features, i.e., T7-F7 and P8-P7 has been found to be associated with epilepsy as described earlier in Table 1. The presence of the feature T7-F7 has been found coupled with the presence of two additional features P7-F3 and P3-FP1. The rule indicates epilepsy with a certainty factor of 0.95. The presence of P8-P7 feature has been found with the presence of a new feature C4-FP1 with a certainty factor of 0.93.

Moreover, multiple rules have been found for the P7-P4 feature. The absence of the feature along with the absence of two additional features P8-F7 and O2-C4 have been found to be associated with epilepsy with a certainty factor of 0.97. On contrary, the presence of the P7-P4 feature along with the absence of three additional P4-FP2, T7-F7, and F8-P3 features has been found in normal individuals with a certainty factor of 0.96. Also, the presence of the same factor along with the absence of three additional O2-FP2, T7-F7, and P3-C3 features has been found in normal individuals with a certainty factor of 0.96. In both the cases, the feature T7-F7 and its absence is a common factor. Such findings reflect the variable states of brain regions even for a very specific disease state.

The feature F7-O2 has been found to be associated with both epileptic patients and normal individuals. The presence of the feature along with the absence of two additional O2-FP2 and O2-F4 features has been associated with epilepsy with a certainty factor of 0.96. On contrary, the absence of the feature along with the absence of two additional T7-P3 and P7-F7 features has been seen in normal individuals with a certainty factor of 0.92. These rules altogether contribute to the generalized epileptic signature.

In the case of male versus female epileptic patients, the presence of O2-P3 feature along with the absence of an additional feature O1-FP1 has been found to be associated with male epilepsy with a certainty factor of 0.91. Similarly, the presence of the feature F8-T7 along with the presence of another additional feature O1-FP1 has been found to be associated with male epilepsy with a certainty factor of 0.92. In contrast, the presence of P4-F4 feature along with the presence of an additional feature P8-P3 has been found to be associated with female epilepsy with a certainty factor of 0.89. The absence of the feature F8-F7 along with the presence of two additional features P8-F7 and P3-C3 have been found to be associated with female epilepsy. Such results support our earlier finding of features in male and female epilepsy as given in Table 1. These rules contribute to the gender-specific epileptic signatures.

We have also done an age-specific fuzzy rule mining study as given in Table 2. Here the presence of F8-F3 and the absence of P8-FP1 features have been found in two separate rules. They have been found to be associated with child epilepsy with certainty factors 0.94 and 0.92 respectively. The absence of P8-FP1 feature along with the presence of two additional features P7-P4 and F7-P4 with a certainty factor of 0.86 has been found to be associated with teenage epilepsy. In the case of adult epilepsy, the presence of the feature P8-T8 along with the absence of two additional features F8-P4 and T8-FP1 has been found to be associated with a certainty factor of 0.8. These results provide support to our earlier findings as given in Table 1 as well as contribute to the age-specific epileptic signatures.

Moreover, the fuzzy rule mining study has discovered some new rules which we have not spotted in our earlier findings. We have found 2 new rules each for epileptic patients and normal individuals. One male and two female epilepsy related new rules have also been found by this study as given in Table 2. In addition, two child, one teenage and two adult epilepsy associated new rules have been found. These new rules along with our earlier findings (Table 1) helped in creating well-defined signatures for future machine learning based detection of epilepsy in general and complicated cases.

6 Conclusions

In this paper, we have found a generalized epileptic signature from EEG data of patients along with gender and age specific signatures. It is not always easy for medical practitioners to rightly identify an epileptic patient, the reason being similar kind of EEG spikes found in other neurobiological disorders. The created epileptic signatures may help them in overcoming these hurdles in epilepsy patient detection. Moreover, these kinds of predictions can come in handy with minimal human intervention in peripheral areas where the proper medical facility is not available yet. They can quickly be employed to detect probable epilepsy. Then the patient can be referred for proper medical care. In this paper, we were also able to distinguish between patients from different age groups and gender but it has not yet been clinically tested. According to clinicians, identifying the gender-specific epileptic signatures is an interesting concept and may be helpful in the future.