research-article

Mobile Emotion Recognition via Multiple Physiological Signals using Convolution-augmented Transformer

Authors:
Kangning Yang

The University of Melbourne, Melbourne, VIC, Australia

The University of Melbourne, Melbourne, VIC, Australia
View Profile

,
Benjamin Tag

The University of Melbourne, Melbourne, VIC, Australia

The University of Melbourne, Melbourne, VIC, Australia
View Profile

,
Yue Gu

Rutgers University, New Brunswick, NJ, USA

Rutgers University, New Brunswick, NJ, USA
View Profile

,
Chaofan Wang

The University of Melbourne, Melbourne, VIC, Australia

The University of Melbourne, Melbourne, VIC, Australia
View Profile

,
Tilman Dingler

The University of Melbourne, Melbourne, VIC, Australia

The University of Melbourne, Melbourne, VIC, Australia
View Profile

,
Greg Wadley

The University of Melbourne, Melbourne, VIC, Australia

The University of Melbourne, Melbourne, VIC, Australia
View Profile

,
Jorge Goncalves

The University of Melbourne, Melbourne, VIC, Australia

The University of Melbourne, Melbourne, VIC, Australia
View Profile

ICMR '22: Proceedings of the 2022 International Conference on Multimedia RetrievalJune 2022Pages 562–570https://doi.org/10.1145/3512527.3531385

Published:27 June 2022Publication History

ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval

Pages 562–570

ABSTRACT

Recognising and monitoring emotional states play a crucial role in mental health and well-being management. Importantly, with the widespread adoption of smart mobile and wearable devices, it has become easier to collect long-term and granular emotion-related physiological data passively, continuously, and remotely. This creates new opportunities to help individuals manage their emotions and well-being in a less intrusive manner using off-the-shelf low-cost devices. Pervasive emotion recognition based on physiological signals is, however, still challenging due to the difficulty to efficiently extract high-order correlations between physiological signals and users' emotional states. In this paper, we propose a novel end-to-end emotion recognition system based on a convolution-augmented transformer architecture. Specifically, it can recognise users' emotions on the dimensions of arousal and valence by learning both the global and local fine-grained associations and dependencies within and across multimodal physiological data (including blood volume pulse, electrodermal activity, heart rate, and skin temperature). We extensively evaluated the performance of our model using the K-EmoCon dataset, which is acquired in naturalistic conversations using off-the-shelf devices and contains spontaneous emotion data. Our results demonstrate that our approach outperforms the baselines and achieves state-of-the-art or competitive performance. We also demonstrate the effectiveness and generalizability of our system on another affective dataset which used affect inducement and commercial physiological sensors.

Supplemental Material

ICMR22-icmrfp159.mp4

mp4

33 MB

Download

References

Lisa Feldman Barrett, Ralph Adolphs, Stacy Marsella, Aleix M Martinez, and Seth D Pollak. 2019. Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological science in the public interest 20, 1 (2019), 1--68.Google Scholar
Behnam Behinaein, Anubhav Bhatti, Dirk Rodenburg, Paul Hungler, and Ali Etemad. 2021. A Transformer Architecture for Stress Detection from ECG. In 2021 International Symposium on Wearable Computers. 132--134.Google ScholarDigital Library
Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, and Quoc V Le. 2019. Attention augmented convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision. 3286--3295.Google ScholarCross Ref
Ira Cohen, Ashutosh Garg, Thomas S Huang, et al . 2000. Emotion recognition from facial expressions using multilevel HMM. In Neural information processing systems, Vol. 2. Citeseer.Google Scholar
Sylvain Delplanque and David Sander. 2021. A fascinating but risky case of reverse inference: From measures to emotions! Food Quality and Preference November 2020 (2021), 104183.Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 248--255.Google ScholarCross Ref
Elena Di Lascio, Shkurta Gashi, and Silvia Santini. 2019. Laughter recognition using non-invasive wearable devices. In Proceedings of the 13th EAI International Conference on Pervasive Computing Technologies for Healthcare. 262--271.Google ScholarDigital Library
Sidney K D'mello and Jacqueline Kory. 2015. A review and meta-analysis of multimodal affect detection systems. Comput. Surveys 47, 3 (2015), 1--36.Google ScholarDigital Library
Jorge Goncalves, Pratyush Pandab, Denzil Ferreira, Mohammad Ghahramani, Guoying Zhao, and Vassilis Kostakos. 2014. Projective Testing of Diurnal Collective Emotion. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '14). 487--497.Google ScholarDigital Library
Hector A Gonzalez, Shahzad Muzaffar, Jerald Yoo, and Ibrahim M Elfadel. 2020. BioCNN: A hardware inference engine for EEG-based emotion detection. IEEE Access 8 (2020), 140896--140914.Google ScholarCross Ref
Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, et al. 2020. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100 (2020).Google Scholar
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456.Google Scholar
John F Kihlstrom. 2021. Ecological validity and "ecological validity". Perspectives on Psychological Science 16, 2 (2021), 466--471.Google ScholarCross Ref
Sander Koelstra, Christian Muhl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. 2011. Deap: A database for emotion analysis; using physiological signals. IEEE transactions on affective computing 3, 1 (2011), 18--31.Google Scholar
Azadeh Kushki, Jillian Fairley, Satyam Merja, Gillian King, and Tom Chau. 2011. Comparison of blood volume pulse and skin conductance responses to mental and affective stimuli at different anatomical sites. Physiological measurement 32, 10 (2011), 1529.Google Scholar
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.Google ScholarCross Ref
Terrance Liu, Paul Pu Liang, Michal Muszynski, Ryo Ishii, David Brent, Randy Auerbach, Nicholas Allen, and Louis-Philippe Morency. 2020. Multimodal privacy- preserving mood prediction from mobile data: A preliminary study. arXiv preprint arXiv:2012.02359 (2020).Google Scholar
Steven Marwaha, Matthew R Broome, Paul E Bebbington, Elizabeth Kuipers, and Daniel Freeman. 2014. Mood instability and psychosis: analyses of British national survey data. Schizophrenia bulletin 40, 2 (2014), 269--277.Google Scholar
Tin Lay Nwe, Say Wei Foo, and Liyanage C De Silva. 2003. Speech emotion recognition using hidden Markov models. Speech communication 41, 4 (2003), 603--623.Google Scholar
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5206--5210.Google ScholarCross Ref
Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim, Ahsan Habib Khandoker, Leontios Hadjileontiadis, Alice Oh, Yong Jeong, and Uichin Lee. 2020. K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations. Scientific Data 7, 1 (2020), 1--16.Google ScholarCross Ref
Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion 37 (2017), 98--125.Google ScholarDigital Library
Jonathan Posner, James A Russell, and Bradley S Peterson. 2005. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Development and psychopathology 17, 3 (2005), 715--734.Google Scholar
Jingyu Quan, Yoshihiro Miyake, and Takayuki Nozawa. 2021. Incorporating Interpersonal Synchronization Features for Automatic Emotion Recognition from Visual and Audio Data during Communication. Sensors 21, 16 (2021), 5317.Google ScholarCross Ref
Mika Raento, Antti Oulasvirta, and Nathan Eagle. 2009. Smartphones: An emerging tool for social scientists. Sociological methods & research 37, 3 (2009), 426--454.Google Scholar
Erika L Rosenberg and Paul Ekman. 1994. Coherence between expressive and experiential systems in emotion. Cognition & Emotion 8, 3 (1994), 201--229.Google ScholarCross Ref
James A Russell. 1980. A circumplex model of affect. Journal of personality and social psychology 39, 6 (1980), 1161.Google ScholarCross Ref
Sriparna Saha, Shreyasi Datta, Amit Konar, and Ramadoss Janarthanan. 2014. A study on emotion recognition from body gestures using Kinect sensor. In 2014 International Conference on Communication and Signal Processing. IEEE, 056--060.Google ScholarCross Ref
Zhanna Sarsenbayeva, Gabriele Marini, Niels van Berkel, Chu Luo, Weiwei Jiang, Kangning Yang, Greg Wadley, Tilman Dingler, Vassilis Kostakos, and Jorge Goncalves. 2020. Does Smartphone Use Drive Our Emotions or Vice Versa? A Causal Analysis. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--15.Google ScholarDigital Library
Elaine Sedenberg and John Chuang. 2017. Smile for the camera: privacy and policy implications of emotion AI. arXiv preprint arXiv:1709.00396 (2017).Google Scholar
Lin Shu, Jinyan Xie, Mingyue Yang, Ziyi Li, Zhenqi Li, Dan Liao, Xiangmin Xu, and Xinyi Yang. 2018. A review of emotion recognition using physiological signals. Sensors 18, 7 (2018), 2074.Google ScholarCross Ref
Yangyang Shu and Shangfei Wang. 2017. Emotion recognition through integrating EEG and peripheral signals. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2871--2875.Google ScholarDigital Library
Mohammad Soleymani, Jeroen Lichtenauer, Thierry Pun, and Maja Pantic. 2011. A multimodal database for affect recognition and implicit tagging. IEEE transactions on affective computing 3, 1 (2011), 42--55.Google Scholar
Isabel Straw. 2021. Ethical implications of emotion mining in medicine. Health Policy and Technology 10, 1 (2021), 191--195.Google ScholarCross Ref
Ramanathan Subramanian, Julia Wache, Mojtaba Khomami Abadi, Radu L Vieriu, Stefan Winkler, and Nicu Sebe. 2016. ASCERTAIN: Emotion and personality recognition using commercial sensors. IEEE Transactions on Affective Computing 9, 2 (2016), 147--160.Google ScholarCross Ref
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).Google Scholar
Bin Wang, Chang Liu, Chuanyan Hu, Xudong Liu, and Jun Cao. 2021. Arrhythmia Classification with Heartbeat-Aware Transformer. In ICASSP 2021--2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1025--1029.Google Scholar
Yi Wang, Zhiyi Huang, Brendan McCane, and Phoebe Neo. 2018. EmotioNet: A 3-D Convolutional Neural Network for EEG-based Emotion Recognition. In 2018 International Joint Conference on Neural Networks (IJCNN). 1--7. https://doi.org/10.1109/IJCNN.2018.8489715Google ScholarCross Ref
Zhu Wang, Zhiwen Yu, Bobo Zhao, Bin Guo, Chao Chen, and Zhiyong Yu. 2020. EmotionSense: An Adaptive Emotion Recognition System Based on Wearable Smart Devices. ACM Transactions on Computing for Healthcare 1, 4 (2020), 1--17.Google ScholarDigital Library
Tianyuan Xu, Ruixiang Yin, Lin Shu, and Xiangmin Xu. 2019. Emotion recognition using frontal eeg in vr affective scenes. In 2019 IEEE MTT-S International Microwave Biomedical Conference (IMBioC), Vol. 1. IEEE, 1--4.Google ScholarCross Ref
Kangning Yang, Chaofan Wang, Yue Gu, Zhanna Sarsenbayeva, Benjamin Tag, Tilman Dingler, Greg Wadley, and Jorge Goncalves. 2021. Behavioral and Physiological Signals-Based Deep Multimodal Approach for Mobile Emotion Recognition. IEEE Transactions on Affective Computing (2021), 1--17.Google ScholarDigital Library
Kangning Yang, Chaofan Wang, Zhanna Sarsenbayeva, Benjamin Tag, Tilman Dingler, Greg Wadley, and Jorge Goncalves. 2021. Benchmarking commercial emotion detection systems using realistic distortions of facial image datasets. The Visual Computer 37 (2021), 1447--1466.Google ScholarDigital Library
Bobo Zhao, Zhu Wang, Zhiwen Yu, and Bin Guo. 2018. EmotionSense: Emotion recognition based on wearable wristband. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 346--355.Google Scholar
Sicheng Zhao, Guiguang Ding, Jungong Han, and Yue Gao. 2018. Personality-Aware Personalized Emotion Recognition from Physiological Signals.. In IJCAI. 1660--1667.Google Scholar
Junjie Zhu, Yuxuan Wei, Yifan Feng, Xibin Zhao, and Yue Gao. 2019. Physiological Signals-based Emotion Recognition via High-order Correlation Learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 3s (2019), 1--18.Google ScholarDigital Library
M Sami Zitouni, Cheul Young Park, Uichin Lee, Leontios Hadjileontiadis, and Ahsan Khandoker. 2021. Arousal-Valence Classification from Peripheral Physiological Signals Using Long Short-Term Memory Networks. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 686--689.Google ScholarCross Ref

Index Terms

Mobile Emotion Recognition via Multiple Physiological Signals using Convolution-augmented Transformer
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Human-centered computing
  1. Ubiquitous and mobile computing

Recommendations

Emotion Recognition Using Physiological Signals
MIDI '15: Proceedings of the Mulitimedia, Interaction, Design and Innnovation

In this paper the problem of emotion recognition using physiological signals is presented. Firstly the problems with acquisition of physiological signals related to specific human emotions are described. It is not a trivial problem to elicit real ...
Read More
Manga content analysis using physiological signals
MANPU '16: Proceedings of the 1st International Workshop on coMics ANalysis, Processing and Understanding

Recently, the physiological signals have been analyzed more and more, especially in the context of everyday life activities such as watching video or looking at pictures. Tracking these signals gives access to the mental state of the user (interest, ...
Read More
Music mood and human emotion recognition based on physiological signals: a systematic review
Abstract
Scientists and researchers have tried to establish a bond between the emotions conveyed and the subsequent mood perceived in a person. Emotions play a major role in terms of our choices, preferences, and decision-making. Emotions appear whenever a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval
June 2022
714 pages
ISBN:9781450392389
DOI:10.1145/3512527
General Chairs:
Vincent Oria
New Jersey Institute of Technology, USA
,
Maria Luisa Sapino
Università degli Studi di Torino, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Brigitte Kerhervé
Université du Québec à Montréal, Canada
,
Program Chairs:
Wen-Huang Cheng
National Yang Ming Chao Tung University, Taiwan
,
Ichiro Ide
Nagoya University, Japan
,
Vivek Singh
Rutgers University, USA
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convolution-augmented transformer
emotion recognition
off-the-shelf mobile devices
physiological signals
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 316
  Total Downloads
- Downloads (Last 12 months)161
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mobile Emotion Recognition via Multiple Physiological Signals using Convolution-augmented Transformer

ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Emotion Recognition Using Physiological Signals

Manga content analysis using physiological signals

Music mood and human emotion recognition based on physiological signals: a systematic review

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Mobile Emotion Recognition via Multiple Physiological Signals using Convolution-augmented Transformer

ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Emotion Recognition Using Physiological Signals

Manga content analysis using physiological signals

Music mood and human emotion recognition based on physiological signals: a systematic review

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media