Keywords

1 Introduction

Sleep stage classification is of great importance for sleep quality assessment, the diagnosis and treatment of sleep disorder [1]. The manual sleep staging is performed by experts based on polysomnography (PSG) that requires a combination of electroencephalogram (EEG), electrooculogram (EOG), electrocardiogram (ECG), electromyogram (EMG) and other signals. This process is non-automatic and has strong subjectivity, which leads to automatic objectively sleep staging by computer. According to the widely used R&K criteria [2] and AASM criteria [3] about sleep stages, previous studies have combined EEG, EMG, EOG signals from PSG to classify 6 stages (Wake, REM (rapid eye movement), S1–S4) or 5 stages (Wake, REM, N1–N3) [1, 4]. In these neurophysiological signals, EEG is considered to be a more effective assessment signal [5]. However, due to the high cost and poor portability, the multi-channel sleep monitoring device is difficult to be popularized, thus making the sleep stage classification based on single channel EEG very important.

Previous studies have shown that single-channel EEG used for sleep staging included Pz-Oz channel [5, 6], Fpz-Cz channel [7, 8], and both these two channels together [9]. Some researchers indicated that Pz-Oz can be used instead of Fpz-Cz to get better classification results [6] and some concluded that Fpz-Cz channel is the better one [8]. It is unclear yet which channel is the optimal channel for sleep stage classification. Using single channel EEG signal, these studies adopted SVM, random forest, and Adaboost classifiers [6, 7, 9]. Different from these classifiers, sparse representation classification (SRC) can use the training data set to represent the test samples to achieve classification [10], which has been successfully applied on the study of EEG signal. Yu used SRC to detect vigilance in the normal EEG signals, and the classification accuracy was 94.22% [11]. SRC was also used for detection of abnormal EEG [12] and brain-computer interface applications [10, 13]. Additionally, Liu used sparse representation and collaborative representation to extracted features and compared the sleep stage classification performance with 78-dimensional features from two channel EEG signals and got 80.47% accuracy [14]. However, SRC method is still rarely used as a classification method in sleep staging research.

An important step of SRC is dictionary learning, in which the most commonly used method is K-SVD (K Singular Value Decomposition) algorithm. K-SVD algorithm is an iterative algorithm for dictionary atoms updating process based on sparse coding and current dictionary, proposed by Aharen [15]. Liu used the K-SVD algorithm to construct a complete dictionary to distinguish between different brain tasks in the activated brain sources and achieved a good result [16]. Previous study demonstrated that the K-SVD algorithm has a good classification performance on neurophysiological signals [17], but such algorithm has not been reported to be used in sleep stage classification.

In this study, we introduce SRC into single-channel EEG sleep stage classification and compare the performance of Pz-Oz and Fpz-Cz channels with a few features. First, the sample entropy of EEG signal and the variance and kurtosis of EEG rhythm for each epoch are extracted as features. Then the K-SVD algorithm is used to train the dictionary of each single sleep stage, whose size is different according to the number of one stage epochs. According to the reconstruction residual of the coding coefficients, classification accuracy is achieved and the classification performance of different channels is compared.

2 Materials and Method

2.1 Data Description

A publicly available dataset Sleep-EDF downloaded from PhysioNet website [18] is used in this study. The EEG records used in our study include four healthy subjects aged from 21 to 35 years. Each subject includes two EEG channels of Fpz-Cz and Pz-Oz. The EEG data recorded from 10:00 pm to 7:00 am of the next day is used. The EEG signal was divided into segments in 30 s with sample rate of 100 Hz, called epochs. The original sleep stages of these epochs were labeled as AWA (wake stage), S1, S2, S3, S4, REM (rapid eye movement) MVT (movement time), and UNS (unknown state).

2.2 Feature Extraction

For each channel, all EEG data are filtered by a FIR band-pass filter at 0.4–35 Hz. MVT epochs and UNS epochs are deleted directly because the number of them is very small and 4318 epochs are used finally.

For each epoch, 5-layer wavelet packet decomposition is performed with db4 wavelet base to acquire EEG rhythm wave (Table 1). Then we calculate the variance and kurtosis of the rhythm wave, kurtosis can detect a sharp rise or fall in the part of the rhythm. Then the sample entropy of each epoch is also calculated as one feature. Finally these 13 features are normalized to [0, 1] (Fig. 1).

Table 1. Frequency ranges in 5-layer wavelet packet decomposition.
Fig. 1.
figure 1

Normalized feature value. D1–D6 represents the rhythm wave. Var represents variance, and Kur represents kurtosis. SampEn represents sample entropy.

2.3 Dictionary Training Using K-SVD

The procedure of SRC contains two steps: dictionary training and coding classification. For a given training set Ytrain, it can be represented by a dictionary D contains all the information and a corresponding sparse coefficient matrix X, as Eq. (1).

$$ \varvec{Y}_{{\varvec{train}}} = \varvec{DX} $$
(1)

The usage of K-SVD algorithm for dictionary training consists of two parts: sparse coding and dictionary updates. A detailed description of the K-SVD algorithm is shown as the following steps.

  1. (a)

    Initialization: initial dictionary \( \varvec{D}^{\left( 0 \right)} \in {\text{R}}^{{{\text{m}} \times {\text{K}}}} \), training set \( \varvec{Y} \in {\text{R}}^{{{\text{m}} \times {\text{N}}}} \). Let i = 1, X = 0, given the upper limit of iteration I.

  2. (b)

    Sparse coding: the sparse matrix X(i) is calculated using Orthogonal Matching Pursuit (OMP) algorithm.

    $$ \mathop {\hbox{min} }\limits_{X} \left\| {\varvec{Y} - \varvec{D}^{\left( i \right)} \varvec{X}^{\left( i \right)} } \right\|_{F}^{2} ,s.t.\left\| {x_{n} } \right\|_{0} \le T,n \in \left\{ {1,2, \ldots ,N} \right\} $$
    (2)

Where T is the number of non-zero elements after sparse coding and N is the number of training samples. D(i), X(i) are the dictionary and the sparse coefficient matrix at i-th iteration.

  1. (c)

    Dictionary update: assuming that both X(i) and D(i) are fixed, to update the k-th column \( d_{k} \) of D(i). Let the k-th row which will be multiplied by \( d_{k} \) in X(i) be \( x_{\text{T}}^{k} \), the objective penalty term can be rewritten as follows:

    $$ \left\| {\varvec{Y} - \varvec{D}^{\left( i \right)} \varvec{X}^{\left( i \right)} } \right\|_{F}^{2} = \left\| {{\text{Y}} - \sum\limits_{j = 1}^{K} {d_{j} x_{T}^{j} } } \right\|_{F}^{2} = \left\| {\left( {\varvec{Y} - \sum\limits_{j \ne k} {d_{j} x_{T}^{j} } } \right) - d_{k} x_{T}^{k} } \right\|_{F}^{2} = \left\| {E_{k} - d_{k} x_{T}^{k} } \right\|_{F}^{2} $$
    (3)

In Eq. (3), \( \varvec{D}^{\left( i \right)} \varvec{X}^{\left( i \right)} \) is decomposed into K matrices with rank 1. Assuming K − 1 items are fixed, the remaining k-th is the one to be updated. The matrix \( E_{k} \) stands for the error for all the N training examples when \( d_{k} \) is removed. Singular value decomposition (SVD) of \( E_{k} \) is conducted as follows:

$$ E_{k} = U_{k} \Delta_{k} V_{k}^{T} $$
(4)

The dictionary atom \( d_{k} \) is replaced by the first column of \( U_{k} \left( {k = 1,2 \ldots ,K,K < N} \right) \).

  1. (d)

    All atoms of dictionary are updated with SVD in K times. Let i = i + 1, the iteration will be terminated and output D if i = I. Otherwise go to step (b).

Each stage of sleep EEG signal can be trained into a dictionary \( \varvec{D}_{i} \in R^{{13 \times K_{i} }} \) with above steps, then combine them into one complete dictionary D = [D AWA , D REM , D S1 , D S2 , D S3 , D S4 ].

2.4 Classification Based on Coding Coefficients

After training dictionary, test samples can be classified by coding coefficients. For a test sample \( y \in R^{13 \times 1} \) which belongs to the specific sleep stage, it could be well approximated by the dictionary D associated with the same class i using y = Da, which \( a = [0,0, \ldots 0,a_{i,1} ,a_{i,2} , \ldots ,a_{{i,K_{i} }} 0,0, \ldots 0]^{T} \in R^{K} \) \( ({\text{K}} = \sum K_{i} ) \) is the coding coefficient vector whose entries are zero except those associated with the i-th class and K i is the size of i-th class dictionary. The nonzero entries in the estimate a will all associate with the columns of D from a single object class that can easily assign the test sample y to one class. The sparsest solution of y = Da is defined as the following L0-optimization problem:

$$ L_{0} :\widehat{a} = argmin\left\| \varvec{a} \right\|_{0} subject\,to\,\varvec{Da} = y $$
(5)

The L0-minimization solution is Nonlinear Programming (NP)-hard problem. It is generally known that if just a few coefficients are not zero in vector a, the sparsest solution can be formulated as the following L1-optimization problem with an error tolerance ε:

$$ L_{1} :\widehat{a} = argmin\left\| \varvec{a} \right\|_{1} subject\,to\left\| {\varvec{Da} - y} \right\|_{2} \le \varepsilon $$
(6)

The sparse coding coefficient solution can be regarded as a convex optimization problem with linear matrix inequalities constraints. For classification problem, a new vector \( \delta_{i} \) is defined, whose nonzero entries are associated with i-th class in a. Then, the test sample can be reconstructed by the coefficient of the same class. And the reconstruction residual can be calculated as follows:

$$ r_{i} \left( y \right) = \left\| {y - \varvec{D}_{i} \delta_{i} \left( \varvec{a} \right)} \right\|_{2} $$

Finally the test sample can be classified to the specific stage with the least residual as identify(y) = argmin (r i (y)).

We perform 2–6 stages sleep stage classification for Fpz-Oz channel and Pz-Cz channel, respectively (Table 2), testing with 10 fold cross-validations by randomly dividing all the 4318 epochs into 10 approximately equal size subsets.

Table 2. The stages included in the different number of sleep stages, 6-stage corresponds to the R&K standard.

3 Result

3.1 The Coding Coefficient of Test Samples

For each test sample, the original 13 features can be represented by no more than 5 code coefficients by the atoms of the corresponding stage in the dictionary, while the coefficient of other stages are zero (Fig. 2).

Fig. 2.
figure 2

The features (left) and coding coefficients (right) of each sleep stage.

For each stage, the code coefficients of test samples almost belong to the specific stage in the dictionary. For the S1 stage, some code coefficients appear in the wake and S2 stages. The overlap appeared in S3 and S4 is kind of obvious (Fig. 3).

Fig. 3.
figure 3

The coding coefficient of whole test set. The black dotted line indicates the interface between the dictionaries.

3.2 Overall Accuracy, Precision and Recall

Table 3 shows the confusion matrix for the 6-stages sleep classification. The evaluation indexes obtained by the confusion matrix are overall accuracy, classification precision and recall rate. They are calculated as follows:

Table 3. Confused matrix of 6-stages classification. Gray background represents Pz-Oz channel and white background represents Fpz-Cz channel
$$ OA = \frac{{\sum\nolimits_{j = 1}^{Q} {M_{ii} } }}{{\sum\nolimits_{i = 1}^{Q} {\sum\nolimits_{j = 1}^{Q} {M_{ij} } } }},P_{i} = \frac{{M_{ii} }}{{\sum\nolimits_{j = 1}^{Q} {M_{ji} } }},R_{i} = \frac{{M_{ii} }}{{\sum\nolimits_{j = 1}^{Q} {M_{ij} } }} $$
(8)

Where Q is the number of stages, \( P_{i} \) and \( R_{i} \) represent the precision and recall rate of i-th class, \( M_{ij} \) is the element at i-th row and j-th column in the confusion matrix.

The classification results of the two channels Pz-Oz and Fpz-Cz in different stages are compared (Tables 4 and 5).

Table 4. The classification overall accuracy (OA) comparisons of Pz-Oz and Fpz-Cz channel from 2-stage to 6-stage.
Table 5. The precision (above) and recall rate (below) of each stage for different stages classification. Gray background represents Pz-Oz channel and white background represents Fpz-Cz channel. Bold fonts indicate the better one between two channels.

Then three subjects’s sleep data are used as training set, the data of another subject is test set. The classification result is given in form of sleep hypnogram (Fig. 4).

Fig. 4.
figure 4

Comparison of subject2’s sleep hypnogram between expert (up) and classifier (down). X-axis represents the epoch number of subject 2, Y-axis represents the different sleep stages.

4 Discussion

Applying SRC based on K-SVD dictionary training on single-channel EEG sleep stage classification has achieved a good performance. The comparison between Pz-Oz channel and Fpz-Cz channel showed that the Pz-Oz channel was much better to be applied on single-channel EEG sleep stage classification than Fpz-Cz channel.

After sparse processing, each epoch’s 13-dimensional features become no more than 5 codes. When the classification is correct, each stage can be linearly represented by the dictionary in the stage of which the test sample belongs to and the coding coefficients are non-zero in this stage while the other coefficients that not belong to the stage are zero (Fig. 2). There are a small number of misclassified samples in the wake stage and the S2 stage, so the recall rates of these two stages are relatively high (Fig. 3). The coding coefficients of all test samples showed that the misclassified samples appear mainly in REM and S2 region leading to a low accuracy because S1 stage is the transition stage between the REM stage and the S2 stage [19]. Some researchers indicate that S1 stage is too similar to the REM stage so they merged S1 stage and REM stage into one [7]. For the same reason, the S3 and S4 stages are not easy to be separated, leading that they are merged into one stage in AASM.

Compared with the state-of-art studies, the results of sparse representation for sleep stage classification showed better precision and recall rate. The 6-stage classification got the better accuracy rate compared with the literature [20] with less features, especially in S1, S2 and S4 stages (Table 5). Most of the staging results are higher than the literature. According to the sleep hypnogram of subject2, the sleep stage classification performance obtained by SRC and experts are almost same (Fig. 4).

For Pz-Oz channel and Fpz-Cz channel, the Pz-Oz channel can get a better performance than Fpz-Cz channel for most of sleep stages in different 2–6 stages sleep stage classification especially for the wake and S2 stages (Table 5). It is not yet concluded which channel is best and still need a large amount of sample data for experimental and further study.

In summary, single channel EEG sleep stage classification can get a good performance with SRC using K-SVD dictionary training. Therefore, the use of this approach can be extended to portable sleep monitoring equipment for daily sleep monitoring. Combination of single channel EEG with other wearable electrophysiological signals (e.g. Heart rate Variation, HRV) might be helpful to improve sleep staging performance.