ABSTRACT
Recognising and monitoring emotional states play a crucial role in mental health and well-being management. Importantly, with the widespread adoption of smart mobile and wearable devices, it has become easier to collect long-term and granular emotion-related physiological data passively, continuously, and remotely. This creates new opportunities to help individuals manage their emotions and well-being in a less intrusive manner using off-the-shelf low-cost devices. Pervasive emotion recognition based on physiological signals is, however, still challenging due to the difficulty to efficiently extract high-order correlations between physiological signals and users' emotional states. In this paper, we propose a novel end-to-end emotion recognition system based on a convolution-augmented transformer architecture. Specifically, it can recognise users' emotions on the dimensions of arousal and valence by learning both the global and local fine-grained associations and dependencies within and across multimodal physiological data (including blood volume pulse, electrodermal activity, heart rate, and skin temperature). We extensively evaluated the performance of our model using the K-EmoCon dataset, which is acquired in naturalistic conversations using off-the-shelf devices and contains spontaneous emotion data. Our results demonstrate that our approach outperforms the baselines and achieves state-of-the-art or competitive performance. We also demonstrate the effectiveness and generalizability of our system on another affective dataset which used affect inducement and commercial physiological sensors.
Supplemental Material
- Lisa Feldman Barrett, Ralph Adolphs, Stacy Marsella, Aleix M Martinez, and Seth D Pollak. 2019. Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological science in the public interest 20, 1 (2019), 1--68.Google Scholar
- Behnam Behinaein, Anubhav Bhatti, Dirk Rodenburg, Paul Hungler, and Ali Etemad. 2021. A Transformer Architecture for Stress Detection from ECG. In 2021 International Symposium on Wearable Computers. 132--134.Google ScholarDigital Library
- Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, and Quoc V Le. 2019. Attention augmented convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision. 3286--3295.Google ScholarCross Ref
- Ira Cohen, Ashutosh Garg, Thomas S Huang, et al . 2000. Emotion recognition from facial expressions using multilevel HMM. In Neural information processing systems, Vol. 2. Citeseer.Google Scholar
- Sylvain Delplanque and David Sander. 2021. A fascinating but risky case of reverse inference: From measures to emotions! Food Quality and Preference November 2020 (2021), 104183.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 248--255.Google ScholarCross Ref
- Elena Di Lascio, Shkurta Gashi, and Silvia Santini. 2019. Laughter recognition using non-invasive wearable devices. In Proceedings of the 13th EAI International Conference on Pervasive Computing Technologies for Healthcare. 262--271.Google ScholarDigital Library
- Sidney K D'mello and Jacqueline Kory. 2015. A review and meta-analysis of multimodal affect detection systems. Comput. Surveys 47, 3 (2015), 1--36.Google ScholarDigital Library
- Jorge Goncalves, Pratyush Pandab, Denzil Ferreira, Mohammad Ghahramani, Guoying Zhao, and Vassilis Kostakos. 2014. Projective Testing of Diurnal Collective Emotion. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '14). 487--497.Google ScholarDigital Library
- Hector A Gonzalez, Shahzad Muzaffar, Jerald Yoo, and Ibrahim M Elfadel. 2020. BioCNN: A hardware inference engine for EEG-based emotion detection. IEEE Access 8 (2020), 140896--140914.Google ScholarCross Ref
- Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, et al. 2020. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100 (2020).Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456.Google Scholar
- John F Kihlstrom. 2021. Ecological validity and "ecological validity". Perspectives on Psychological Science 16, 2 (2021), 466--471.Google ScholarCross Ref
- Sander Koelstra, Christian Muhl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. 2011. Deap: A database for emotion analysis; using physiological signals. IEEE transactions on affective computing 3, 1 (2011), 18--31.Google Scholar
- Azadeh Kushki, Jillian Fairley, Satyam Merja, Gillian King, and Tom Chau. 2011. Comparison of blood volume pulse and skin conductance responses to mental and affective stimuli at different anatomical sites. Physiological measurement 32, 10 (2011), 1529.Google Scholar
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.Google ScholarCross Ref
- Terrance Liu, Paul Pu Liang, Michal Muszynski, Ryo Ishii, David Brent, Randy Auerbach, Nicholas Allen, and Louis-Philippe Morency. 2020. Multimodal privacy- preserving mood prediction from mobile data: A preliminary study. arXiv preprint arXiv:2012.02359 (2020).Google Scholar
- Steven Marwaha, Matthew R Broome, Paul E Bebbington, Elizabeth Kuipers, and Daniel Freeman. 2014. Mood instability and psychosis: analyses of British national survey data. Schizophrenia bulletin 40, 2 (2014), 269--277.Google Scholar
- Tin Lay Nwe, Say Wei Foo, and Liyanage C De Silva. 2003. Speech emotion recognition using hidden Markov models. Speech communication 41, 4 (2003), 603--623.Google Scholar
- Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5206--5210.Google ScholarCross Ref
- Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim, Ahsan Habib Khandoker, Leontios Hadjileontiadis, Alice Oh, Yong Jeong, and Uichin Lee. 2020. K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations. Scientific Data 7, 1 (2020), 1--16.Google ScholarCross Ref
- Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion 37 (2017), 98--125.Google ScholarDigital Library
- Jonathan Posner, James A Russell, and Bradley S Peterson. 2005. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Development and psychopathology 17, 3 (2005), 715--734.Google Scholar
- Jingyu Quan, Yoshihiro Miyake, and Takayuki Nozawa. 2021. Incorporating Interpersonal Synchronization Features for Automatic Emotion Recognition from Visual and Audio Data during Communication. Sensors 21, 16 (2021), 5317.Google ScholarCross Ref
- Mika Raento, Antti Oulasvirta, and Nathan Eagle. 2009. Smartphones: An emerging tool for social scientists. Sociological methods & research 37, 3 (2009), 426--454.Google Scholar
- Erika L Rosenberg and Paul Ekman. 1994. Coherence between expressive and experiential systems in emotion. Cognition & Emotion 8, 3 (1994), 201--229.Google ScholarCross Ref
- James A Russell. 1980. A circumplex model of affect. Journal of personality and social psychology 39, 6 (1980), 1161.Google ScholarCross Ref
- Sriparna Saha, Shreyasi Datta, Amit Konar, and Ramadoss Janarthanan. 2014. A study on emotion recognition from body gestures using Kinect sensor. In 2014 International Conference on Communication and Signal Processing. IEEE, 056--060.Google ScholarCross Ref
- Zhanna Sarsenbayeva, Gabriele Marini, Niels van Berkel, Chu Luo, Weiwei Jiang, Kangning Yang, Greg Wadley, Tilman Dingler, Vassilis Kostakos, and Jorge Goncalves. 2020. Does Smartphone Use Drive Our Emotions or Vice Versa? A Causal Analysis. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--15.Google ScholarDigital Library
- Elaine Sedenberg and John Chuang. 2017. Smile for the camera: privacy and policy implications of emotion AI. arXiv preprint arXiv:1709.00396 (2017).Google Scholar
- Lin Shu, Jinyan Xie, Mingyue Yang, Ziyi Li, Zhenqi Li, Dan Liao, Xiangmin Xu, and Xinyi Yang. 2018. A review of emotion recognition using physiological signals. Sensors 18, 7 (2018), 2074.Google ScholarCross Ref
- Yangyang Shu and Shangfei Wang. 2017. Emotion recognition through integrating EEG and peripheral signals. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2871--2875.Google ScholarDigital Library
- Mohammad Soleymani, Jeroen Lichtenauer, Thierry Pun, and Maja Pantic. 2011. A multimodal database for affect recognition and implicit tagging. IEEE transactions on affective computing 3, 1 (2011), 42--55.Google Scholar
- Isabel Straw. 2021. Ethical implications of emotion mining in medicine. Health Policy and Technology 10, 1 (2021), 191--195.Google ScholarCross Ref
- Ramanathan Subramanian, Julia Wache, Mojtaba Khomami Abadi, Radu L Vieriu, Stefan Winkler, and Nicu Sebe. 2016. ASCERTAIN: Emotion and personality recognition using commercial sensors. IEEE Transactions on Affective Computing 9, 2 (2016), 147--160.Google ScholarCross Ref
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).Google Scholar
- Bin Wang, Chang Liu, Chuanyan Hu, Xudong Liu, and Jun Cao. 2021. Arrhythmia Classification with Heartbeat-Aware Transformer. In ICASSP 2021--2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1025--1029.Google Scholar
- Yi Wang, Zhiyi Huang, Brendan McCane, and Phoebe Neo. 2018. EmotioNet: A 3-D Convolutional Neural Network for EEG-based Emotion Recognition. In 2018 International Joint Conference on Neural Networks (IJCNN). 1--7. https://doi.org/10.1109/IJCNN.2018.8489715Google ScholarCross Ref
- Zhu Wang, Zhiwen Yu, Bobo Zhao, Bin Guo, Chao Chen, and Zhiyong Yu. 2020. EmotionSense: An Adaptive Emotion Recognition System Based on Wearable Smart Devices. ACM Transactions on Computing for Healthcare 1, 4 (2020), 1--17.Google ScholarDigital Library
- Tianyuan Xu, Ruixiang Yin, Lin Shu, and Xiangmin Xu. 2019. Emotion recognition using frontal eeg in vr affective scenes. In 2019 IEEE MTT-S International Microwave Biomedical Conference (IMBioC), Vol. 1. IEEE, 1--4.Google ScholarCross Ref
- Kangning Yang, Chaofan Wang, Yue Gu, Zhanna Sarsenbayeva, Benjamin Tag, Tilman Dingler, Greg Wadley, and Jorge Goncalves. 2021. Behavioral and Physiological Signals-Based Deep Multimodal Approach for Mobile Emotion Recognition. IEEE Transactions on Affective Computing (2021), 1--17.Google ScholarDigital Library
- Kangning Yang, Chaofan Wang, Zhanna Sarsenbayeva, Benjamin Tag, Tilman Dingler, Greg Wadley, and Jorge Goncalves. 2021. Benchmarking commercial emotion detection systems using realistic distortions of facial image datasets. The Visual Computer 37 (2021), 1447--1466.Google ScholarDigital Library
- Bobo Zhao, Zhu Wang, Zhiwen Yu, and Bin Guo. 2018. EmotionSense: Emotion recognition based on wearable wristband. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 346--355.Google Scholar
- Sicheng Zhao, Guiguang Ding, Jungong Han, and Yue Gao. 2018. Personality-Aware Personalized Emotion Recognition from Physiological Signals.. In IJCAI. 1660--1667.Google Scholar
- Junjie Zhu, Yuxuan Wei, Yifan Feng, Xibin Zhao, and Yue Gao. 2019. Physiological Signals-based Emotion Recognition via High-order Correlation Learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 3s (2019), 1--18.Google ScholarDigital Library
- M Sami Zitouni, Cheul Young Park, Uichin Lee, Leontios Hadjileontiadis, and Ahsan Khandoker. 2021. Arousal-Valence Classification from Peripheral Physiological Signals Using Long Short-Term Memory Networks. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 686--689.Google ScholarCross Ref
Index Terms
- Mobile Emotion Recognition via Multiple Physiological Signals using Convolution-augmented Transformer
Recommendations
Emotion Recognition Using Physiological Signals
MIDI '15: Proceedings of the Mulitimedia, Interaction, Design and InnnovationIn this paper the problem of emotion recognition using physiological signals is presented. Firstly the problems with acquisition of physiological signals related to specific human emotions are described. It is not a trivial problem to elicit real ...
Manga content analysis using physiological signals
MANPU '16: Proceedings of the 1st International Workshop on coMics ANalysis, Processing and UnderstandingRecently, the physiological signals have been analyzed more and more, especially in the context of everyday life activities such as watching video or looking at pictures. Tracking these signals gives access to the mental state of the user (interest, ...
Music mood and human emotion recognition based on physiological signals: a systematic review
AbstractScientists and researchers have tried to establish a bond between the emotions conveyed and the subsequent mood perceived in a person. Emotions play a major role in terms of our choices, preferences, and decision-making. Emotions appear whenever a ...
Comments