MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals

Zhu, Lei; Ding, Yu; Huang, Aiai; Tan, Xufei; Zhang, Jianhai

doi:10.1007/s11760-024-03632-0

MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals

Original Paper
Published: 04 December 2024

Volume 19, article number 58, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Lei Zhu¹,
Yu Ding¹^na1,
Aiai Huang¹^na1,
Xufei Tan²^na1 &
…
Jianhai Zhang^3,4^na1

421 Accesses
Explore all metrics

Abstract

Currently, research on emotion recognition has shown that multi-modal data fusion has advantages in improving the accuracy and robustness of human emotion recognition, outperforming single-modal methods. Despite the promising results of existing methods, significant challenges remain in effectively fusing data from multiple modalities to achieve superior performance. Firstly, existing works tend to focus on generating a joint representation by fusing multi-modal data, with fewer methods considering the specific characteristics of each modality. Secondly, most methods fail to fully capture the intricate correlations among multiple modalities, often resorting to simplistic combinations of latent features. To address these challenges, we propose a novel fusion network for multi-modal emotion recognition. This network enhances the efficacy of multi-modal fusion while preserving the distinct characteristics of each modality. Specifically, a dual-stream multi-scale feature encoding (MFE) is designed to extract emotional information from both electroencephalogram (EEG) and peripheral physiological signals (PPS) temporal slices. Subsequently, a cross-modal global–local feature fusion module (CGFFM) is proposed to integrate global and local information from multi-modal data and then assign different importance to each modality, which makes the fusion data tend to the more important modalities. Meanwhile, the transformer module is employed to further learn the modality-specific information. Moreover, we introduce the adaptive collaboration block (ACB), which optimally leverages both modality-specific and cross-modality relations for enhanced integration and feature representation. Following extensive experiments on the DEAP and DREAMER multimodal datasets, our model achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A multi-stage dynamical fusion network for multimodal emotion recognition

Article 31 July 2022

A review on EEG-based multimodal learning for emotion recognition

Article Open access 14 February 2025

LMR-CBT: learning modality-fused representations with CB-Transformer for multimodal emotion recognition from unaligned multimodal sequences

Article 16 December 2023

Data availibility

The authors do not have permission to share data.

References

Fiorini, L., Mancioppi, G., Semeraro, F., Fujita, H., Cavallo, F.: Unsupervised emotional state classification through physiological parameters for social robotics applications. Knowl.-Based Syst. 190, 105217 (2020)
Article Google Scholar
Mane, S.A.M., Shinde, A.: StressNet: hybrid model of LSTM and CNN for stress detection from electroencephalogram signal (EEG). Results Control Optim. 11, 100231 (2023)
Article MATH Google Scholar
Gao, D., Wang, K., Wang, M., Zhou, J., Zhang, Y.: SFT-Net: a network for detecting fatigue from EEG signals by combining 4D feature flow and attention mechanism. IEEE J. Biomed. Health Informa 28, 4444–4455 (2023). https://api.semanticscholar.org/CorpusID:259153959
Wang, Y., Song, W., Tao, W., Liotta, A., Yang, D., Li, X., Gao, S., Sun, Y., Ge, W., Zhang, W., et al.: A systematic review on affective computing: emotion models, databases, and recent advances. Inf. Fusion 83, 19–52 (2022)
Article MATH Google Scholar
Li, Y., Guo, W., Wang, Y.: Emotion recognition with attention mechanism-guided dual-feature multi-path interaction network. Signal Image Video Process. 1–10 (2024)
Kim, H., Zhang, D., Kim, L., Im, C.-H.: Classification of individual’s discrete emotions reflected in facial microexpressions using electroencephalogram and facial electromyogram. Expert Syst. Appl. 188, 116101 (2022)
Article MATH Google Scholar
Rahman, M.M., Sarkar, A.K., Hossain, M.A., Hossain, M.S., Islam, M.R., Hossain, M.B., Quinn, J.M., Moni, M.A.: Recognition of human emotions using EEG signals: a review. Comput. Biol. Med. 136, 104696 (2021)
Article MATH Google Scholar
Shukla, J., Barreda-Angeles, M., Oliver, J., Nandi, G.C., Puig, D.: Feature extraction and selection for emotion recognition from electrodermal activity. IEEE Trans. Affect. Comput. 12(4), 857–869 (2019)
Article Google Scholar
Zhang, Q., Chen, X., Zhan, Q., Yang, T., Xia, S.: Respiration-based emotion recognition with deep learning. Comput. Ind. 92, 84–90 (2017)
Article MATH Google Scholar
Saleem, A.A., Siddiqui, H.U.R., Raza, M.A., Rustam, F., Dudley, S.E.M., Ashraf, I.: A systematic review of physiological signals based driver drowsiness detection systems. Cogn. Neurodyn. 17, 1229–1259 (2022)
Article Google Scholar
Liu, H., Lou, T., Zhang, Y., Wu, Y., Xiao, Y., Jensen, C.S., Zhang, D.: EEG-based multimodal emotion recognition: a machine learning perspective. IEEE Trans. Instrum. Meas. (2024)
Ferri, F., Tajadura-Jiménez, A., Väljamäe, A., Vastano, R., Costantini, M.: Emotion-inducing approaching sounds shape the boundaries of multisensory peripersonal space. Neuropsychologia 70, 468–475 (2015)
Article Google Scholar
Ekman, P., Friesen, W.V., Ellsworth, P.C.: Emotion in the human face: guidelines for research and an integration of findings (1972). https://api.semanticscholar.org/CorpusID:141855078
Zhao, S., Jia, G., Yang, J., Ding, G., Keutzer, K.: Emotion recognition from multiple modalities: fundamentals and methodologies. IEEE Signal Process. Mag. 38, 59–73 (2021)
Article MATH Google Scholar
Ackermann, P., Kohlschein, C., Bitsch, J.A., Wehrle, K., Jeschke, S.: EEG-based automatic emotion recognition: feature extraction, selection and classification methods. In: 2016 IEEE 18th International Conference on E-health Networking, Applications and Services (Healthcom), pp. 1–6. IEEE (2016)
Zhang, Y., Zhang, Y., Wang, S.: An attention-based hybrid deep learning model for EEG emotion recognition. SIViP 17(5), 2305–2313 (2023)
Article MATH Google Scholar
Tao, W., Li, C., Song, R., Cheng, J., Liu, Y., Wan, F., Chen, X.: EEG-based emotion recognition via channel-wise attention and self attention. IEEE Trans. Affect. Comput. 14(1), 382–393 (2020)
Article Google Scholar
Liu, Y., Ding, Y., Li, C., Cheng, J., Song, R., Wan, F., Chen, X.: Multi-channel EEG-based emotion recognition via a multi-level features guided capsule network. Comput. Biol. Med. 123, 103927 (2020)
Article Google Scholar
Li, D., Xie, L., Chai, B., Wang, Z., Yang, H.: Spatial-frequency convolutional self-attention network for EEG emotion recognition. Appl. Soft Comput. 122, 108740 (2022)
Article Google Scholar
Li, C., Wang, B., Zhang, S., Liu, Y., Song, R., Cheng, J., Chen, X.: Emotion recognition from EEG based on multi-task learning with capsule network and attention mechanism. Comput. Biol. Med. 143, 105303 (2022)
Article MATH Google Scholar
Ru, X., He, K., Lyu, B., Li, D., Xu, W., Gu, W., Ma, X., Liu, J., Li, C., Li, T., et al.: Multimodal neuroimaging with optically pumped magnetometers: a simultaneous MEG-EEG-FNIRS acquisition system. Neuroimage 259, 119420 (2022)
Article MATH Google Scholar
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37, 98–125 (2017)
Article MATH Google Scholar
Baltrušaitis, T., Ahuja, C., Morency, L.-P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)
Article MATH Google Scholar
Agarwal, R., Andujar, M., Canavan, S.J.: Classification of emotions using EEG activity associated with different areas of the brain. Pattern Recognit. Lett. 162, 71–80 (2022)
Article MATH Google Scholar
Lin, W., Li, C., Sun, S.: Deep convolutional neural network for emotion recognition using EEG and peripheral physiological signal. In: Image and Graphics: 9th International Conference, ICIG 2017, Shanghai, China, September 13-15, 2017, Revised Selected Papers, Part II 9, pp. 385–394. Springer (2017)
Ma, J., Tang, H., Zheng, W.-L., Lu, B.-L.: Emotion recognition using multimodal residual lstm network. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 176–183 (2019)
Li, Q., Liu, Y., Yan, F., Zhang, Q., Liu, C.: Emotion recognition based on multiple physiological signals. Zhongguo yi liao qi xie za zhi = Chin. J. Med. Instrum. 444, 283–287 (2020)
Chen, S., Tang, J., Zhu, L., Kong, W.: A multi-stage dynamical fusion network for multimodal emotion recognition. Cogn. Neurodyn. 17, 671–680 (2022)
Article MATH Google Scholar
Wang, Y., Jiang, W.-B., Li, R., Lu, B.-L.: Emotion transformer fusion: complementary representation properties of EEG and eye movements on recognizing anger and surprise. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1575–1578. IEEE (2021)
Gong, L., Chen, W., Li, M., Zhang, T.: Emotion recognition from multiple physiological signals using intra-and inter-modality attention fusion network. Digit. Signal Process. 144, 104278 (2024)
Article Google Scholar
Liu, W., Qiu, J., Zheng, W.-L., Lu, B.-L.: Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition. IEEE Trans. Cognit. Dev. Syst. 14, 715–729 (2021)
Article MATH Google Scholar
Fu, B., Gu, C., Fu, M., Xia, Y., Liu, Y.: A novel feature fusion network for multimodal emotion recognition from EEG and eye movement signals. Front. Neurosci. 17, 1234162 (2023)
Article MATH Google Scholar
Zhang, Y., Cheng, C., Zhang, Y.: Multimodal emotion recognition using a hierarchical fusion convolutional neural network. IEEE Access 9, 7943–7951 (2021). https://doi.org/10.1109/ACCESS.2021.3049516
Article MATH Google Scholar
Koelstra, S., Muhl, C., Soleymani, M., Lee, J.-S., Yazdani, A., Ebrahimi, T., Pun, T., Nijholt, A., Patras, I.: Deap: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2011)
Article Google Scholar
Morris, J.D.: Observations: SAM: the Self-assessment Manikin an efficient cross-cultural measurement of emotional response 1. J. Advert. Res. 35(6), 63–68 (1995)
MathSciNet MATH Google Scholar
Katsigiannis, S., Ramzan, N.: Dreamer: a database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J. Biomed. Health Inform. 22(1), 98–107 (2017)
Article Google Scholar

Download references

Acknowledgements

We sincerely appreciate all the editors and reviewers for their insightful comments and constructive suggestions. This work was supported by the Key Research and Development Project of Zhejiang Province(Grant No. 2020C04009), Laboratory of Brain Machine Collaborative(Grant No. 2020E10010), and Zhejiang Provincial Natural Science Foundation of China (Grant No. LGF22H090004).

Author information

Yu Ding, Aiai Huang,Xufei Tan and Jianhai Zhang contributed equally to this work.

Authors and Affiliations

School of Automation, Hangzhou Dianzi University, Hangzhou, 310000, China
Lei Zhu, Yu Ding & Aiai Huang
School of Medicine, Hangzhou City University, Hangzhou, 310015, China
Xufei Tan
School of Computer Science, Hangzhou Dianzi University, Hangzhou, 310000, China
Jianhai Zhang
Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province, Hangzhou City University, Hangzhou, 310015, China
Jianhai Zhang

Authors

Lei Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Yu Ding
View author publications
You can also search for this author inPubMed Google Scholar
Aiai Huang
View author publications
You can also search for this author inPubMed Google Scholar
Xufei Tan
View author publications
You can also search for this author inPubMed Google Scholar
Jianhai Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors have contributed equally.

Corresponding author

Correspondence to Lei Zhu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, L., Ding, Y., Huang, A. et al. MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals. SIViP 19, 58 (2025). https://doi.org/10.1007/s11760-024-03632-0

Download citation

Received: 23 July 2024
Revised: 26 September 2024
Accepted: 15 October 2024
Published: 04 December 2024
DOI: https://doi.org/10.1007/s11760-024-03632-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A multi-stage dynamical fusion network for multimodal emotion recognition

A review on EEG-based multimodal learning for emotion recognition

LMR-CBT: learning modality-fused representations with CB-Transformer for multimodal emotion recognition from unaligned multimodal sequences

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now