skip to main content
10.1145/3474085.3475583acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

HetEmotionNet: Two-Stream Heterogeneous Graph Recurrent Neural Network for Multi-modal Emotion Recognition

Published: 17 October 2021 Publication History

Abstract

The research on human emotion under multimedia stimulation based on physiological signals is an emerging field and important progress has been achieved for emotion recognition based on multi-modal signals. However, it is challenging to make full use of the complementarity among spatial-spectral-temporal domain features for emotion recognition, as well as model the heterogeneity and correlation among multi-modal signals. In this paper, we propose a novel two-stream heterogeneous graph recurrent neural network, named HetEmotionNet, fusing multi-modal physiological signals for emotion recognition. Specifically, HetEmotionNet consists of the spatial-temporal stream and the spatial-spectral stream, which can fuse spatial-spectral-temporal domain features in a unified framework. Each stream is composed of the graph transformer network for modeling the heterogeneity, the graph convolutional network for modeling the correlation, and the gated recurrent unit for capturing the temporal domain or spectral domain dependency. Extensive experiments on two real-world datasets demonstrate that our proposed model achieves better performance than state-of-the-art baselines.

Supplementary Material

ZIP File (mfp2235aux.zip)
# Notation and explanations =
MP4 File (MM21-mfp2235.mp4)
This video introduces our model HetEmotionNet in detail. We introduce emotion and previous research work. It is challenging to make full use of the complementarity among spatial-spectral-temporal domain features for emotion recognition, as well as model the heterogeneity and correlation among multi-modal signals. Therefore, we propose a novel HetEmotionNet, fusing multi-modal physiological signals for emotion recognition.

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[2]
Bernhard E Boser, Isabelle M Guyon, and Vladimir N Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144--152, 1992.
[3]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013.
[4]
Xiyang Cai, Ziyu Jia, Minfang Tang, and Gaoxing Zheng. Brainsleepnet: Learning multivariate eeg representation for automatic sleep staging. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 976--979. IEEE, 2020.
[5]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
[6]
Muhammad Najam Dar, Muhammad Usman Akram, Sajid Gul Khawaja, and Amit N Pujari. Cnn and lstm-based emotion charting using physiological signals. Sensors, 20(16):4551, 2020.
[7]
Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. A tutorial on the cross-entropy method. Annals of operations research, 134(1):19--67, 2005.
[8]
Ruo-Nan Duan, Jia-Yi Zhu, and Bao-Liang Lu. Differential entropy feature for eeg-based emotion classification. In 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), pages 81--84. IEEE, 2013.
[9]
Gene H Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2):215--223, 1979.
[10]
William James. What is an Emotion? Simon and Schuster, 2013.
[11]
Ziyu Jia, Xiyang Cai, Gaoxing Zheng, Jing Wang, and Youfang Lin. Sleepprintnet: A multivariate multimodal neural network based on physiological time-series for automatic sleep staging. IEEE Transactions on Artificial Intelligence, 1(3):248--257, 2020.
[12]
Ziyu Jia, Youfang Lin, Xiyang Cai, Haobin Chen, Haijun Gou, and Jing Wang. Sst-emotionnet: Spatial-spectral-temporal based attention 3d dense network for eeg emotion recognition. In Proceedings of the 28th ACM International Conference on Multimedia, pages 2909--2917, 2020.
[13]
Ziyu Jia, Youfang Lin, Zehui Jiao, Yan Ma, and Jing Wang. Detecting causality in multivariate time series via non-uniform embedding. Entropy, 21(12):1233, 2019.
[14]
Ziyu Jia, Youfang Lin, Yunxiao Liu, Zehui Jiao, and Jing Wang. Refined nonuni- form embedding for coupling detection in multivariate time series. Physical Review E, 101(6):062113, 2020.
[15]
Ziyu Jia, Youfang Lin, Jing Wang, Xuehui Wang, Peiyi Xie, and Yingbin Zhang. Salientsleepnet: Multimodal salient wave detection network for sleep staging. arXiv preprint arXiv:2105.13864, 2021.
[16]
Ziyu Jia, Youfang Lin, Jing Wang, Kaixin Yang, Tianhang Liu, and Xinwang Zhang. Mmcnn: A multi-branch multi-scale convolutional neural network for motor imagery classification. In Frank Hutter, Kristian Kersting, Jefrey Lijffijt, and Isabel Valera, editors, Machine Learning and Knowledge Discovery in Databases, pages 736--751, Cham, 2021. Springer International Publishing.
[17]
Ziyu Jia, Youfang Lin, Jing Wang, Ronghao Zhou, Xiaojun Ning, Yuanlai He, and Yaoshuai Zhao. Graphsleepnet: Adaptive spatial-temporal graph convolutional networks for sleep stage classification. In IJCAI, pages 1324--1330, 2020.
[18]
Ziyu Jia, Youfang Lin, Hongjun Zhang, and Jing Wang. Sleep stage classification model based ondeep convolutional neural network. Journal of ZheJiang University (Engineering Science), 54(10):1899--1905, 2020.
[19]
Yingying Jiang, Wei Li, M Shamim Hossain, Min Chen, Abdulhameed Alelaiwi, and Muneer Al-Hammadi. A snapshot research and implementation of mul- timodal information fusion for data-driven emotion recognition. Information Fusion, 53:209--221, 2020.
[20]
Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
[21]
Sander Koelstra, Christian Muhl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. Deap: A database for emotion analysis; using physiological signals. IEEE transactions on affective computing, 3(1):18--31, 2011.
[22]
Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. Estimating mutual information. Physical review E, 69(6):066138, 2004.
[23]
Sylvia D Kreibig. Autonomic nervous system activity in emotion: A review. Biological psychology, 84(3):394--421, 2010.
[24]
Zhenqi Li, Jing Wang, Ziyu Jia, and Youfang Lin. Learning space-time-frequency representation with two-stream attention based 3d network for motor imagery classification. In 2020 IEEE International Conference on Data Mining (ICDM), pages 1124--1129. IEEE, 2020.
[25]
Jinxiang Liao, Qinghua Zhong, Yongsheng Zhu, and Dongli Cai. Multimodal physiological signal emotion recognition based on convolutional recurrent neuralnetwork. In IOP Conference Series: Materials Science and Engineering, volume 782, page 032005. IOP Publishing, 2020.
[26]
Antje Lichtenstein, Astrid Oehme, Stefan Kupschick, and Thomas Jürgensohn. Comparing two emotion models for deriving affective states from physiological data. In Affect and emotion in human-computer interaction, pages 35--50. Springer, 2008.
[27]
Wenqian Lin, Chao Li, and Shouqian Sun. Deep convolutional neural network for emotion recognition using eeg and peripheral physiological signal. In Inter- national Conference on Image and Graphics, pages 385--394. Springer, 2017.
[28]
Yi-Lin Lin and Gang Wei. Speech emotion recognition based on hmm and svm. In 2005 international conference on machine learning and cybernetics, volume 8, pages 4898--4901. IEEE, 2005.
[29]
Wei Liu, Jie-Lin Qiu, Wei-Long Zheng, and Bao-Liang Lu. Multimodal emo- tion recognition using deep canonical correlation analysis. arXiv preprint arXiv:1908.05349, 2019.
[30]
Wei Liu, Wei-Long Zheng, and Bao-Liang Lu. Multimodal emotion recognition using multimodal deep learning. arXiv preprint arXiv:1602.08225, 2016.
[31]
Yunxiao Liu, Youfang Lin, Ziyu Jia, Yan Ma, and Jing Wang. Representation based on ordinal patterns for seizure detection in eeg signals. Computers in Biology and Medicine, 126:104033, 2020.
[32]
Yunxiao Liu, Youfang Lin, Ziyu Jia, Jing Wang, and Yan Ma. A new dissimilarity measure based on ordinal pattern for analyzing physiological signals. Physica A: Statistical Mechanics and its Applications, 574:125997, 2021.
[33]
Yifei Lu, Wei-Long Zheng, Binbin Li, and Bao-Liang Lu. Combining eye movements and eeg to enhance emotion recognition. In IJCAI, volume 15, pages 1170--1176. Citeseer, 2015.
[34]
Jiaxin Ma, Hao Tang, Wei-Long Zheng, and Bao-Liang Lu. Emotion recognition using multimodal residual lstm network. In Proceedings of the 27th ACM International Conference on Multimedia, pages 176--183, 2019.
[35]
Nicola Martini, Danilo Menicucci, Laura Sebastiani, Remo Bedini, Alessandro Pingitore, Nicola Vanello, Matteo Milanesi, Luigi Landini, and Angelo Gemignani. The dynamics of eeg gamma responses to unpleasant visual stimuli: From local activity to functional connectivity. Neuro Image, 60(2):922--932, 2012.
[36]
Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha. M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 1359--1367, 2020.
[37]
Toshimitsu Musha, Yuniko Terasaki, Hasnine A Haque, and George A Ivamitsky. Feature extraction from eegs associated with emotions. Artificial Life and Robotics, 1(1):15--19, 1997.
[38]
Joana Pinto, Ana Fred, and Hugo Plácido da Silva. Biosignal-based multimodal emotion recognition in a valence-arousal affective framework applied to immersive video visualization. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 3577--3583. IEEE, 2019.
[39]
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
[40]
Elham S Salama, Reda A El-Khoribi, Mahmoud E Shoman, and Mohamed A Wahby Shalaby. Eeg-based emotion recognition using 3d convolutional neural networks. Int. J. Adv. Comput. Sci. Appl, 9(8):329--337, 2018.
[41]
Annett Schirmer and Ralph Adolphs. Emotion perception from face, voice, and touch: comparisons and convergence. Trends in cognitive sciences, 21(3):216--228, 2017.
[42]
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[43]
Mohammad Soleymani, Jeroen Lichtenauer, Thierry Pun, and Maja Pantic. A multimodal database for affect recognition and implicit tagging. IEEE transactions on affective computing, 3(1):42--55, 2011.
[44]
Tengfei Song, Wenming Zheng, Peng Song, and Zhen Cui. Eeg emotion recognition using dynamical graph convolutional neural networks. IEEE Transactions on Affective Computing, 2018.
[45]
Wei Tao, Chang Li, Rencheng Song, Juan Cheng, Yu Liu, Feng Wan, and Xun Chen. Eeg-based emotion recognition via channel-wise attention and self attention. IEEE Transactions on Affective Computing, 2020.
[46]
Xiao-Wei Wang, Dan Nie, and Bao-Liang Lu. Emotional state classification from eeg data using machine learning approach. Neurocomputing, 129:94--106, 2014.
[47]
Zhiyuan Wen, Ruifeng Xu, and Jiachen Du. A novel convolutional neural net- works for emotion recognition based on eeg signal. In 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), pages 672--677. IEEE, 2017.
[48]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 2020.
[49]
Yilong Yang, Qingfeng Wu, Yazhen Fu, and Xiaowei Chen. Continuous convolutional neural network with 3d input for eeg-based emotion recognition. In International Conference on Neural Information Processing, pages 433--443. Springer, 2018.
[50]
Yilong Yang, Qingfeng Wu, Ming Qiu, Yingdong Wang, and Xiaowei Chen. Emotion recognition from multi-channel eeg through parallel convolutional recurrent neural network. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1--7. IEEE, 2018.
[51]
Yu-Xuan Yang, Zhong-Ke Gao, Xin-Min Wang, Yan-Li Li, Jing-Wei Han, Norbert Marwan, and Jürgen Kurths. A recurrence quantification analysis-based channel-frequency convolutional neural network for emotion recognition from eeg. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(8):085724, 2018.
[52]
Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. Graph transformer networks. arXiv preprint arXiv:1911.06455, 2019.
[53]
Jianhua Zhang, Zhong Yin, Peng Chen, and Stefano Nichele. Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review. Information Fusion, 59:103--126, 2020.
[54]
Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, and Yang Li. Spatial--temporal recurrent neural network for emotion recognition. IEEE transactions on cybernetics, 49(3):839--847, 2018.
[55]
Xiaowei Zhang, Jing Pan, Jian Shen, Zia Ud Din, Junlei Li, Dawei Lu, Manxi Wu, and Bin Hu. Fusing of electroencephalogram and eye movement with group sparse canonical correlation analysis for anxiety detection. IEEE Transactions on Affective Computing, 2020.
[56]
Yuxuan Zhao, Xinyan Cao, Jinlong Lin, Dunshan Yu, and Xixin Cao. Multi- modal emotion recognition model using physiological signals. arXiv preprint arXiv:1911.12918, 2019.
[57]
Wei-Long Zheng, Wei Liu, Yifei Lu, Bao-Liang Lu, and Andrzej Cichocki. Emotionmeter: A multimodal framework for recognizing human emotions. IEEE transactions on cybernetics, 49(3):1110--1122, 2018.
[58]
Wei-Long Zheng and Bao-Liang Lu. Investigating critical frequency bands and channels for eeg-based emotion recognition with deep neural networks. IEEE Transactions on Autonomous Mental Development, 7(3):162--175, 2015.
[59]
Wei-Long Zheng, Jia-Yi Zhu, and Bao-Liang Lu. Identifying stable patterns over time for emotion recognition from eeg. IEEE Transactions on Affective Computing, 10(3):417--429, 2017.
[60]
Wei-Long Zheng, Jia-Yi Zhu, Yong Peng, and Bao-Liang Lu. Eeg-based emotion classification using deep belief networks. In 2014 IEEE International Conference on Multimedia and Expo (ICME), pages 1--6. IEEE, 2014.
[61]
Jia Ziyu, Lin Youfang, Liu Tianhang, Yang Kaixin, Zhang Xinwang, and Wang Jing. Motor imagery classification based on multiscale feature extraction and squeeze-excitation model. Journal of Computer Research and Development, 57(12):2481, 20

Cited By

View all
  • (2025)Meaningful Multimodal Emotion Recognition Based on Capsule Graph Transformer ArchitectureInformation10.3390/info1601004016:1(40)Online publication date: 10-Jan-2025
  • (2025)Comprehensive Multisource Learning Network for Cross-Subject Multimodal Emotion RecognitionIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.34064229:1(365-380)Online publication date: Feb-2025
  • (2024)A Comprehensive Survey on Emerging Techniques and Technologies in Spatio-Temporal EEG Data AnalysisChinese Journal of Information Fusion10.62762/CJIF.2024.8768301:3(183-211)Online publication date: 15-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. affective computing
  2. graph recurrent neural network
  3. heterogeneous graph
  4. multi-modal emotion recognition

Qualifiers

  • Research-article

Funding Sources

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)200
  • Downloads (Last 6 weeks)12
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Meaningful Multimodal Emotion Recognition Based on Capsule Graph Transformer ArchitectureInformation10.3390/info1601004016:1(40)Online publication date: 10-Jan-2025
  • (2025)Comprehensive Multisource Learning Network for Cross-Subject Multimodal Emotion RecognitionIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.34064229:1(365-380)Online publication date: Feb-2025
  • (2024)A Comprehensive Survey on Emerging Techniques and Technologies in Spatio-Temporal EEG Data AnalysisChinese Journal of Information Fusion10.62762/CJIF.2024.8768301:3(183-211)Online publication date: 15-Dec-2024
  • (2024)A Comprehensive Interaction in Multiscale Multichannel EEG Signals for Emotion RecognitionMathematics10.3390/math1208118012:8(1180)Online publication date: 15-Apr-2024
  • (2024)Domain adaptation spatial feature perception neural network for cross-subject EEG emotion recognitionFrontiers in Human Neuroscience10.3389/fnhum.2024.147163418Online publication date: 17-Dec-2024
  • (2024)VSGTProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/341(3078-3086)Online publication date: 3-Aug-2024
  • (2024)Multi-level disentangling network for cross-subject emotion recognition based on multimodal physiological signalsProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/340(3069-3077)Online publication date: 3-Aug-2024
  • (2024)Enhancing cross-subject emotion recognition precision through unimodal EEG: a novel emotion preceptor modelBrain Informatics10.1186/s40708-024-00245-811:1Online publication date: 18-Dec-2024
  • (2024)Drug repurposing based on the DTD-GNN graph neural network: revealing the relationships among drugs, targets and diseasesBMC Genomics10.1186/s12864-024-10499-525:1Online publication date: 11-Jun-2024
  • (2024)GNN4EEG: A Benchmark and Toolkit for Electroencephalography Classification with Graph Neural NetworkCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678475(612-617)Online publication date: 5-Oct-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media