Abstract
A kernel attention module (KAM) is presented for the task of EEG-based emotion classification using neural network based models. In this study, it is shown that the KAM method can lead to more efficient and accurate models using only a single parameter design. This additional parameter can be leveraged as an interpretable scalar quantity for examining the overall amount of attention needed during deep feature refinement. Extensive experiments are analyzed on both the SEED and DEAP datasets to demonstrate the module’s performance on subject-dependent classification tasks. From these benchmark studies, it is shown that KAM is able to boost the backbone model’s mean prediction accuracy by more than 3% on some subjects and up to more than 1%, on average, across 15 subjects in the SEED dataset for subject dependent tasks. In the DEAP dataset, the improvement is more significant by achieving greater than 3% improvement in the overall mean accuracy versus the no-attention case, and more than 1–2% when benchmarked against various other state-of-the-art attention modules. In addition, the predictive dependencies of KAM with respect to its single parameter is numerically examined up to first order. Accompanying analyzes and visualization techniques are also proposed for interpreting the KAM attention module’s effects, and interaction with the backbone model’s predictive behaviors. These quantitative results can be explored in greater depth to identify correlations with pertinent clinical neuroscientific observations. Finally, a formal mathematical proof of KAM’s permutation equivariance property is included.
Similar content being viewed by others
Data availability
Both datasets analyzed during the current study can be obtained upon reasonable requests at links below:
Notes
Note: Attention can also be performed via right multiplication, \(v\varphi (q^Tk)\).
Most of them did not report the number of trainable parameters in their models.
More in-depth and meaningful analysis can be performed when specific samples are identified by domain experts as signal templates associated with each label and are broadly recognized within clinical practices/communities as being clinically admissible.
Moreover, the conclusion will also hold for other distance based function if \(M_k(X;\theta )_{uv}\) is designed otherwise as \(M_k(X;\theta )_{uv} = [g\circ d](X_{u}, X_{v})\) for proper function g of one variable.
References
Giannopoulos P, Perikos I, Hatzilygeroudis I (2018) Deep learning approaches for facial emotion recognition: a case study on FER-2013. In: Advances in hybridization of intelligent methods, Springer, Berlin, pp 1–16
Khan AR (2022) Facial emotion recognition using conventional machine learning and deep learning methods: current achievements, analysis and remaining challenges. Information 13(6):268
Özseven T (2019) A novel feature selection method for speech emotion recognition. Appl Acoust 146:320–326
Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1d & 2d cnn lstm networks. Biomed Signal Process Control 47:312–323
Koelstra S, Muhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2012) DEAP: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31. https://doi.org/10.1109/T-AFFC.2011.15
Zheng WL, Lu BL (2015) Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks. IEEE Trans Auton Ment Dev 7(3):162–175
Congedo M, Barachant A, Bhatia R (2017) Riemannian geometry for EEG-based brain-computer interfaces; a primer and a review. Brain-Comput Interfaces 4(3):155–174
Asghar MA, Khan MJ, Amin Y, Rizwan M, Rahman M, Badnava S, Mirjavadi SS et al (2019) EEG-based multi-modal emotion recognition using bag of deep features: an optimal feature selection approach. Sensors 19(23):5218
Li Y, Huang J, Zhou H, Zhong N (2017) Human emotion recognition with electroencephalographic multidimensional features by hybrid deep neural networks. Appl Sci 7(10):1060
Asghar MA, Khan MJ, Rizwan M, Shorfuzzaman M, Mehmood RM (2022) AI inspired EEG-based spatial feature selection method using multivariate empirical mode decomposition for emotion classification. Multimed Syst 28(4):1275–1288
Kumari N, Anwar S, Bhattacharjee V (2022) Time series-dependent feature of EEG signals for improved visually evoked emotion classification using EmotionCapsNet. Neural Comput Appl 34:1–13
Schirrmeister RT, Springenberg JT, Fiederer LDJ, Glasstetter M, Eggensperger K, Tangermann M, Hutter F, Burgard W, Ball T (2017) Deep learning with convolutional neural networks for EEG decoding and visualization. Hum Brain Mapp 38(11):5391–5420
Lawhern VJ, Solon AJ, Waytowich NR, Gordon SM, Hung CP, Lance BJ (2018) EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. J Neural Eng 15(5):056013
Huang D, Chen S, Liu C, Zheng L, Tian Z, Jiang D (2021) Differences first in asymmetric brain: a bi-hemisphere discrepancy convolutional neural network for EEG emotion recognition. Neurocomputing 448:140–151. https://doi.org/10.1016/j.neucom.2021.03.105
Hu J, Wang C, Jia Q, Bu Q, Sutcliffe R, Feng J (2021) Scalingnet: extracting features from raw EEG data for emotion recognition. Neurocomputing 463:177–184. https://doi.org/10.1016/j.neucom.2021.08.018
Almanza-Conejo O, Almanza-Ojeda DL, Contreras-Hernandez JL, Ibarra-Manzano MA (2022) Emotion recognition in EEG signals using the continuous wavelet transform and CNNs. Neural Comput Appl 1–14
Zhang T, Zheng W, Cui Z, Zong Y, Li Y (2018) Spatial-temporal recurrent neural network for emotion recognition. IEEE Trans Cybern 49(3):839–847
Zhang Y, Chen J, Tan JH, Chen Y, Chen Y, Li D, Yang L, Su J, Huang X, Che W (2020) An investigation of deep learning models for EEG-based emotion recognition. Front Neurosci 14:622759
Song T, Zheng W, Song P, Cui Z (2018) EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Trans Affect Comput 11:532–541
Zhong P, Wang D, Miao C (2022) EEG-based emotion recognition using regularized graph neural networks. IEEE Trans Affect Comput 13(3):1290–1301. https://doi.org/10.1109/TAFFC.2020.2994159
Chu Y, Zhao X, Zou Y, Xu W, Song G, Han J, Zhao Y (2020) Decoding multiclass motor imagery EEG from the same upper limb by combining Riemannian geometry features and partial least squares regression. J Neural Eng 17(4):046029
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 7132–7141
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19
Bahdanau D, Cho K, Bengio Y (2016) Neural machine translation by jointly learning to align and translate
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022
Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025
Voita E, Talbot D, Moiseev F, Sennrich R, Titov I (2019) Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned. arXiv preprint arXiv:1905.09418
Li Y, Zheng W, Cui Z, Zhang T, Zong Y (2018) A novel neural network model based on cerebral hemispheric asymmetry for EEG emotion recognition. In: IJCAI, pp 1561–1567
Li Y, Fu B, Li F, Shi G, Zheng W (2021) A novel transferability attention neural network model for EEG emotion recognition. Neurocomputing 447:92–101. https://doi.org/10.1016/j.neucom.2021.02.048
Joshi VM, Ghongade RB (2021) EEG based emotion detection using fourth order spectral moment and deep learning. Biomed Signal Process Control 68:102755
Ahmed MZI, Sinha N, Phadikar S, Ghaderpour E (2022) Automated feature extraction on AsMap for emotion classification using EEG. Sensors 23:2346
Li Y, Zheng W, Wang L, Zong Y, Cui Z (2022) From regional to global brain: a novel hierarchical spatial-temporal neural network model for EEG emotion recognition. IEEE Trans Affect Comput
Feng L, Cheng C, Zhao M, Deng H, Zhang Y (2022) EEG-based emotion recognition using spatial-temporal graph convolutional LSTM with attention mechanism. IEEE J Biomed Health Inf 26:5406–5417
Miao M, Zheng L, Xu B, Yang Z, Hu W (2023) A multiple frequency bands parallel spatial-temporal 3d deep residual learning framework for EEG-based emotion recognition. Biomed Signal Process Control 79:104141
Salama ES, El-Khoribi RA, Shoman ME, Shalaby MAW (2018) EEG-based emotion recognition using 3d convolutional neural networks. Int J Adv Comput Sci Appl 9(8)
Goghari VM, MacDonald AW III, Sponheim SR (2011) Temporal lobe structures and facial emotion recognition in schizophrenia patients and nonpsychotic relatives. Schizophr Bull 37(6):1281–1294
Kumfor F, Irish M, Hodges JR, Piguet O (2014) Frontal and temporal lobe contributions to emotional enhancement of memory in behavioral-variant frontotemporal dementia and alzheimer’s disease. Front Behav Neurosci 8:225
KuangD Michoski C (2022) Kam-a kernel attention module for emotion classification with EEG data. In: Reyes M, Henriques Abreu P, Cardoso J (eds) Interpretability of machine intelligence in medical image computing. Springer Nature, Switzerland, pp 93–103
Schyns PG, Thut G, Gross J (2011) Cracking the code of oscillatory activity. PLoS Biol 9(5):e1001064
Abhang PA, Gawali BW, Mehrotra SC (2016) Introduction to EEG-and speech-based emotion recognition. Academic Press, Cambridge
Acknowledgements
This submission is a much extended version from our early idea presented in the workshop on Interpretability of Machine Intelligence in Medical Image Computing at MICCAI 2022. This premature work can be checked at https://link.springer.com/chapter/10.1007/978-3-031-17976-1_9. This work was supported in part by the Young Scientists Fund of the National Natural Science Foundation of China (NSFC) under grant No.12301677 and the Fundamental Research Funds for the Central Universities, Sun Yat-sen University, CHINA, under Grant 22qntd2901.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Permutation matrix A permutation can be represented as a matrix P whose elements only take values in \(\{0, 1\}\). Note that the sum of each row (column) from a permutation matrix P is always 1. Also note that \(\text {PX}\), by definition, will reorder X’s rows, and XP will reorder X’s columns. As an example, consider the matrix/vector system:
It is immediately clear in this example that if \(P_{\text {ij}} = 1\), P will send X’s j-th row to the i-th row by left multiplication. Moreover, P is unitary: \(P^TP = P^TP = I\).
Lemma 1
The kernel matrix \(M_K(X; \theta )\) has the property:
for any permutation matrix P.
Proof
Suppose the feature block X has shape \(m\times n\). A permutation matrix P (\(m\times m\)) sends the u-th row of X to the i-th row, and the v-th row of X to the j-th row, i.e., \([PX]_i = X_u\) and \([\text {PX}]_j = X_v\). In terms of P this implies that \(P_{\text {iu}} = 1, P_{\text {is}} = 0\) for \(s \ne u\) and \(P_{\text {jv}} = 1, P_{\text {jt}} = 0\) for \(t\ne v\).
On the left side, the matrix value at location (i, j) then becomes:
while on the right side,
It is easy to see from the above proof that the conclusion does not depend on the choice of distance function, e.g., \(L^p\) norms for any p can also be used.Footnote 4\(\square\)
Theorem 1
The proposed KAM attention layer as shown in Eq. (1): \(\varvec{\psi }(X):= [I + M_K(X; \theta )]X\) is permutation equivariant. That is, for a permuation matrix P, \(\varvec{\psi }(\text {PX}) = P\varvec{\psi }(X)\)
Proof
This result follows by a direct calculation from Lemma1, as:
\(\square\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kuang, D., Michoski, C. Attention with kernels for EEG-based emotion classification. Neural Comput & Applic 36, 5251–5266 (2024). https://doi.org/10.1007/s00521-023-09344-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09344-9