Abstract
Currently, research on emotion recognition has shown that multi-modal data fusion has advantages in improving the accuracy and robustness of human emotion recognition, outperforming single-modal methods. Despite the promising results of existing methods, significant challenges remain in effectively fusing data from multiple modalities to achieve superior performance. Firstly, existing works tend to focus on generating a joint representation by fusing multi-modal data, with fewer methods considering the specific characteristics of each modality. Secondly, most methods fail to fully capture the intricate correlations among multiple modalities, often resorting to simplistic combinations of latent features. To address these challenges, we propose a novel fusion network for multi-modal emotion recognition. This network enhances the efficacy of multi-modal fusion while preserving the distinct characteristics of each modality. Specifically, a dual-stream multi-scale feature encoding (MFE) is designed to extract emotional information from both electroencephalogram (EEG) and peripheral physiological signals (PPS) temporal slices. Subsequently, a cross-modal global–local feature fusion module (CGFFM) is proposed to integrate global and local information from multi-modal data and then assign different importance to each modality, which makes the fusion data tend to the more important modalities. Meanwhile, the transformer module is employed to further learn the modality-specific information. Moreover, we introduce the adaptive collaboration block (ACB), which optimally leverages both modality-specific and cross-modality relations for enhanced integration and feature representation. Following extensive experiments on the DEAP and DREAMER multimodal datasets, our model achieves state-of-the-art performance.
Similar content being viewed by others
Data availibility
The authors do not have permission to share data.
References
Fiorini, L., Mancioppi, G., Semeraro, F., Fujita, H., Cavallo, F.: Unsupervised emotional state classification through physiological parameters for social robotics applications. Knowl.-Based Syst. 190, 105217 (2020)
Mane, S.A.M., Shinde, A.: StressNet: hybrid model of LSTM and CNN for stress detection from electroencephalogram signal (EEG). Results Control Optim. 11, 100231 (2023)
Gao, D., Wang, K., Wang, M., Zhou, J., Zhang, Y.: SFT-Net: a network for detecting fatigue from EEG signals by combining 4D feature flow and attention mechanism. IEEE J. Biomed. Health Informa 28, 4444–4455 (2023). https://api.semanticscholar.org/CorpusID:259153959
Wang, Y., Song, W., Tao, W., Liotta, A., Yang, D., Li, X., Gao, S., Sun, Y., Ge, W., Zhang, W., et al.: A systematic review on affective computing: emotion models, databases, and recent advances. Inf. Fusion 83, 19–52 (2022)
Li, Y., Guo, W., Wang, Y.: Emotion recognition with attention mechanism-guided dual-feature multi-path interaction network. Signal Image Video Process. 1–10 (2024)
Kim, H., Zhang, D., Kim, L., Im, C.-H.: Classification of individual’s discrete emotions reflected in facial microexpressions using electroencephalogram and facial electromyogram. Expert Syst. Appl. 188, 116101 (2022)
Rahman, M.M., Sarkar, A.K., Hossain, M.A., Hossain, M.S., Islam, M.R., Hossain, M.B., Quinn, J.M., Moni, M.A.: Recognition of human emotions using EEG signals: a review. Comput. Biol. Med. 136, 104696 (2021)
Shukla, J., Barreda-Angeles, M., Oliver, J., Nandi, G.C., Puig, D.: Feature extraction and selection for emotion recognition from electrodermal activity. IEEE Trans. Affect. Comput. 12(4), 857–869 (2019)
Zhang, Q., Chen, X., Zhan, Q., Yang, T., Xia, S.: Respiration-based emotion recognition with deep learning. Comput. Ind. 92, 84–90 (2017)
Saleem, A.A., Siddiqui, H.U.R., Raza, M.A., Rustam, F., Dudley, S.E.M., Ashraf, I.: A systematic review of physiological signals based driver drowsiness detection systems. Cogn. Neurodyn. 17, 1229–1259 (2022)
Liu, H., Lou, T., Zhang, Y., Wu, Y., Xiao, Y., Jensen, C.S., Zhang, D.: EEG-based multimodal emotion recognition: a machine learning perspective. IEEE Trans. Instrum. Meas. (2024)
Ferri, F., Tajadura-Jiménez, A., Väljamäe, A., Vastano, R., Costantini, M.: Emotion-inducing approaching sounds shape the boundaries of multisensory peripersonal space. Neuropsychologia 70, 468–475 (2015)
Ekman, P., Friesen, W.V., Ellsworth, P.C.: Emotion in the human face: guidelines for research and an integration of findings (1972). https://api.semanticscholar.org/CorpusID:141855078
Zhao, S., Jia, G., Yang, J., Ding, G., Keutzer, K.: Emotion recognition from multiple modalities: fundamentals and methodologies. IEEE Signal Process. Mag. 38, 59–73 (2021)
Ackermann, P., Kohlschein, C., Bitsch, J.A., Wehrle, K., Jeschke, S.: EEG-based automatic emotion recognition: feature extraction, selection and classification methods. In: 2016 IEEE 18th International Conference on E-health Networking, Applications and Services (Healthcom), pp. 1–6. IEEE (2016)
Zhang, Y., Zhang, Y., Wang, S.: An attention-based hybrid deep learning model for EEG emotion recognition. SIViP 17(5), 2305–2313 (2023)
Tao, W., Li, C., Song, R., Cheng, J., Liu, Y., Wan, F., Chen, X.: EEG-based emotion recognition via channel-wise attention and self attention. IEEE Trans. Affect. Comput. 14(1), 382–393 (2020)
Liu, Y., Ding, Y., Li, C., Cheng, J., Song, R., Wan, F., Chen, X.: Multi-channel EEG-based emotion recognition via a multi-level features guided capsule network. Comput. Biol. Med. 123, 103927 (2020)
Li, D., Xie, L., Chai, B., Wang, Z., Yang, H.: Spatial-frequency convolutional self-attention network for EEG emotion recognition. Appl. Soft Comput. 122, 108740 (2022)
Li, C., Wang, B., Zhang, S., Liu, Y., Song, R., Cheng, J., Chen, X.: Emotion recognition from EEG based on multi-task learning with capsule network and attention mechanism. Comput. Biol. Med. 143, 105303 (2022)
Ru, X., He, K., Lyu, B., Li, D., Xu, W., Gu, W., Ma, X., Liu, J., Li, C., Li, T., et al.: Multimodal neuroimaging with optically pumped magnetometers: a simultaneous MEG-EEG-FNIRS acquisition system. Neuroimage 259, 119420 (2022)
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37, 98–125 (2017)
Baltrušaitis, T., Ahuja, C., Morency, L.-P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)
Agarwal, R., Andujar, M., Canavan, S.J.: Classification of emotions using EEG activity associated with different areas of the brain. Pattern Recognit. Lett. 162, 71–80 (2022)
Lin, W., Li, C., Sun, S.: Deep convolutional neural network for emotion recognition using EEG and peripheral physiological signal. In: Image and Graphics: 9th International Conference, ICIG 2017, Shanghai, China, September 13-15, 2017, Revised Selected Papers, Part II 9, pp. 385–394. Springer (2017)
Ma, J., Tang, H., Zheng, W.-L., Lu, B.-L.: Emotion recognition using multimodal residual lstm network. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 176–183 (2019)
Li, Q., Liu, Y., Yan, F., Zhang, Q., Liu, C.: Emotion recognition based on multiple physiological signals. Zhongguo yi liao qi xie za zhi = Chin. J. Med. Instrum. 444, 283–287 (2020)
Chen, S., Tang, J., Zhu, L., Kong, W.: A multi-stage dynamical fusion network for multimodal emotion recognition. Cogn. Neurodyn. 17, 671–680 (2022)
Wang, Y., Jiang, W.-B., Li, R., Lu, B.-L.: Emotion transformer fusion: complementary representation properties of EEG and eye movements on recognizing anger and surprise. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1575–1578. IEEE (2021)
Gong, L., Chen, W., Li, M., Zhang, T.: Emotion recognition from multiple physiological signals using intra-and inter-modality attention fusion network. Digit. Signal Process. 144, 104278 (2024)
Liu, W., Qiu, J., Zheng, W.-L., Lu, B.-L.: Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition. IEEE Trans. Cognit. Dev. Syst. 14, 715–729 (2021)
Fu, B., Gu, C., Fu, M., Xia, Y., Liu, Y.: A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals. Front. Neurosci. 17, 1234162 (2023)
Zhang, Y., Cheng, C., Zhang, Y.: Multimodal emotion recognition using a hierarchical fusion convolutional neural network. IEEE Access 9, 7943–7951 (2021). https://doi.org/10.1109/ACCESS.2021.3049516
Koelstra, S., Muhl, C., Soleymani, M., Lee, J.-S., Yazdani, A., Ebrahimi, T., Pun, T., Nijholt, A., Patras, I.: Deap: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2011)
Morris, J.D.: Observations: SAM: the Self-assessment Manikin an efficient cross-cultural measurement of emotional response 1. J. Advert. Res. 35(6), 63–68 (1995)
Katsigiannis, S., Ramzan, N.: Dreamer: a database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J. Biomed. Health Inform. 22(1), 98–107 (2017)
Acknowledgements
We sincerely appreciate all the editors and reviewers for their insightful comments and constructive suggestions. This work was supported by the Key Research and Development Project of Zhejiang Province(Grant No. 2020C04009), Laboratory of Brain Machine Collaborative(Grant No. 2020E10010), and Zhejiang Provincial Natural Science Foundation of China (Grant No. LGF22H090004).
Author information
Authors and Affiliations
Contributions
All authors have contributed equally.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, L., Ding, Y., Huang, A. et al. MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals. SIViP 19, 58 (2025). https://doi.org/10.1007/s11760-024-03632-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03632-0