skip to main content
10.1145/3347320.3357692acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Adversarial Domain Adaption for Multi-Cultural Dimensional Emotion Recognition in Dyadic Interactions

Published: 15 October 2019 Publication History

Abstract

Cross-cultural emotion recognition has been a challenging research problem in the affective computing field. In this paper, we present our solutions for the Cross-cultural Emotion Sub-challenge (CES) in Audio/Visual Emotion Challenge (AVEC) 2019. The aim of this task is to investigate how emotion knowledge of Western European cultures (German and Hungarian) can be transferred to Chinese culture. Previous studies have shown that the cultural difference can bring significant performance impact to emotion recognition across cultures. In this paper, we propose an unsupervised adversarial domain adaptation approach to bridge the gap across different cultures for emotion recognition. The highlights of our complete solution for the CES challenge task include: 1) several efficient deep features from multiple modalities and the LSTM network to capture the temporal information. 2) several multimodal interaction strategies to take advantage of the interlocutor's multimodal information. 3) an unsupervised adversarial adaptation approach to bridge the emotion knowledge gap across different cultures. Our solutions achieve the best CCC performance of 0.4, 0.471 and 0.257 for arousal, valence and likability respectively on the challenge testing set of Chinese, which outperforms the baseline system with corresponding CCC of 0.355, 0.468 and 0.041.

References

[1]
Emad Barsoum, Cha Zhang, Cristian Canton Ferrer, and Zhengyou Zhang. 2016. Training deep networks for facial expression recognition with crowd-sourced label distribution. In ACM International Conference on Multimodal Interaction. 279--283.
[2]
Kevin Brady, Youngjune Gwon, Pooya Khorrami, Elizabeth Godoy, William Campbell, Charlie Dagli, and Thomas S. Huang. 2016. Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction. In International Workshop on Audio/visual Emotion Challenge. 97--104.
[3]
Linlin Chao, Jianhua Tao, Minghao Yang, Ya Li, and Zhengqi Wen. 2015. Long Short Term Memory Recurrent Neural Network Based Multimodal Dimensional Emotion Recognition. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge (AVEC '15). ACM, New York, NY, USA, 65--72. https://doi.org/10.1145/2808196.2811634
[4]
Shizhe Chen and Qin Jin. 2015. Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks. In International Workshop on Audio/visual Emotion Challenge. 49--56.
[5]
Shizhe Chen and Qin Jin. 2016. Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction. In ACM on Multimedia Conference. 571--575.
[6]
Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal Multi-task Learning for Dimensional and Continuous Emotion Recognition. In The Workshop on Audio/visual Emotion Challenge. 19--26.
[7]
Shizhe Chen, Shuai Wang, and Qin Jin. No. 4, 2018. Multimodal emotion recognition in multi-cultural conditions. In Journal of Software. 1060--1070.
[8]
Bo-Chang Chiou and Chia-Ping Chen. 2014. Speech emotion recognition with cross-lingual databases. In INTERSPEECH 2014. 558--561.
[9]
Cristina Conati. 2002. Probabilistic assessment of user's emotions in educational games. Applied Artificial Intelligence, Vol. 16, 7--8 (2002), 555--575.
[10]
Fabien Ringeval and Björn Schuller and Michel Valstar and Nicholas Cummins and Roddy Cowie and Mohammad Soleymani and Maximilian Schmitt and Shahin Amiriparian and Eva-Maria Messner and Leili Tavabi and Siyang Song and Sina Alisamir and Shuo Liu and Ziping Zhao and Adria Mallol-Ragolta and Zhao Ren and Maja Pantic. 2019. VEC 2019 Workshop and Challenge: State-of-Mind, Depression with AI, and Cross-Cultural Affect Recognition. In Proceedings of the 9th International Workshop on Audio/Visual Emotion Challenge, AVEC'19, co-located with the 27th ACM International Conference on Multimedia, MM 2019. ACM, Nice, France.
[11]
Fabien Ringeval and Björn Schuller and Michel Valstar and Roddy Cowie and Heysem Kaya and Maximilian Schmitt and Shahin Amiriparian and Nicholas Cummins and Denis Lalanne and Adrien Michaud and Elvan c Ciftc ci and Hüseyin Gülec c and Albert Ali Salah and Maja Pantic. 2018. AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition. In Proceedings of the 8th International Workshop on Audio/Visual Emotion Challenge, AVEC'18, co-located with the 26th ACM International Conference on Multimedia, MM 2018, Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic (Eds.). ACM, Seoul, Korea.
[12]
N Fragopanagos and J. G. Taylor. 2002. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, Vol. 18, 1 (2002), 32--80.
[13]
Yaroslav Ganin and Victor Lempitsky. 2014. Unsupervised domain adaptation by backpropagation. arXiv preprint arXiv:1409.7495 (2014).
[14]
John Gideon, Melvin G Mcinnis, and Emily Mower Provost. 2019. Barking up the Right Tree: Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG). IEEE Transactions on Affective Computing (2019).
[15]
Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, and Bryan Seybold. 2016. CNN architectures for large-scale audio classification. (2016), 131--135.
[16]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 2261--2269.
[17]
Zhaocheng Huang, Ting Dang, Nicholas Cummins, Brian Stasak, Phu Le, Vidhyasaharan Sethu, and Julien Epps. 2015. An Investigation of Annotation Delay Compensation and Output-Associative Fusion for Multimodal Continuous Emotion Prediction. In International Workshop on AVECe. 41--48.
[18]
Heysem Kaya and Alexey Karpov. 2018. Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing, Vol. 275 (2018), 1028--1034.
[19]
Jingjun Liang, Chen Shizhe, Jinming Zhao, Jin Qin, Liu Haibo, and Lu li. 2019. Cross-culture Multimodal Emotion Recognition with Adversarial Learning. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4000--4004. https://doi.org/10.1109/ICASSP.2019.8683725
[20]
L. I. Lin. 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics (1989).
[21]
Stacy Marsella and Jonathan Gratch. 2014. Computationally modeling human emotion. Communications of the Acm, Vol. 57, 12 (2014), 56--67.
[22]
Angeliki Metallinou, Athanasios Katsamanis, and Shrikanth Narayanan. 2012. A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs. In ICASSP. 2401--2404. https://doi.org/10.1109/ICASSP.2012.6288399
[23]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems, Vol. 26 (2013), 3111--3119.
[24]
Michael Neumann and Ngoc Thang Vu. 2018. Cross-lingual and Multilingual Speech Emotion Recognition on English and French. ICASSP (2018).
[25]
Artem Rozantsev, Mathieu Salzmann, and Pascal Fua. 2016. Beyond Sharing Weights for Deep Domain Adaptation. IEEE Transactions on Pattern Analysis & Machine Intelligence, Vol. PP, 99 (2016), 1--1.
[26]
Hesam Sagha, Jun Deng, Maryna Gavryukova, Jing Han, and Björn Schuller. 2016. Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace. ICASSP (2016), 5800--5804.
[27]
Haim Sak, Andrew Senior, and Francoise Beaufays. 2014. Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition. Computer Science (2014), 338--342.
[28]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014).
[29]
Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. 2017. Adversarial Discriminative Domain Adaptation. CVPR (2017), 2962--2971.
[30]
Biqiao Zhang, Emily Mower Provost, and Georg Essl. 2019. Cross-Corpus Acoustic Emotion Recognition with Multi-Task Learning: Seeking Common Ground While Preserving Differences. IEEE Transactions on Affective Computing, Vol. 10, 1 (2019), 85--99.
[31]
Zixing Zhang, Nicholas Cummins, and Björn Schuller. 2017. Advanced Data Exploitation in Speech Analysis: An overview. IEEE Signal Processing Magazine, Vol. 34, 4 (2017), 107--129.
[32]
Jinming Zhao, Ruichen Li, Chen Shizhe, and Jin Qin. 2018. Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions. In The Workshop on Audio/visual Emotion Challenge. 19--26.

Cited By

View all
  • (2024)COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332577046:2(805-822)Online publication date: Feb-2024
  • (2024)Are 3D Face Shapes Expressive Enough for Recognising Continuous Emotions and Action Unit Intensities?IEEE Transactions on Affective Computing10.1109/TAFFC.2023.328053015:2(535-548)Online publication date: Apr-2024
  • (2024)HCI Research and Innovation in China: A 10-Year PerspectiveInternational Journal of Human–Computer Interaction10.1080/10447318.2024.232385840:8(1799-1831)Online publication date: 22-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AVEC '19: Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop
October 2019
96 pages
ISBN:9781450369138
DOI:10.1145/3347320
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adversarial domain adaption
  2. dyadic interaction
  3. multimodal emotion recognition

Qualifiers

  • Research-article

Funding Sources

Conference

MM '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 52 of 98 submissions, 53%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332577046:2(805-822)Online publication date: Feb-2024
  • (2024)Are 3D Face Shapes Expressive Enough for Recognising Continuous Emotions and Action Unit Intensities?IEEE Transactions on Affective Computing10.1109/TAFFC.2023.328053015:2(535-548)Online publication date: Apr-2024
  • (2024)HCI Research and Innovation in China: A 10-Year PerspectiveInternational Journal of Human–Computer Interaction10.1080/10447318.2024.232385840:8(1799-1831)Online publication date: 22-Mar-2024
  • (2024)A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videosApplied Intelligence10.1007/s10489-024-05329-w54:4(3040-3057)Online publication date: 1-Feb-2024
  • (2023)Modelling Stochastic Context of Audio-Visual Expressive Behaviour With Affective ProcessesIEEE Transactions on Affective Computing10.1109/TAFFC.2022.315714114:3(2290-2303)Online publication date: 1-Jul-2023
  • (2023)Informative Speech Features based on Emotion Classes and Gender in Explainable Speech Emotion Recognition2023 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)10.1109/ACIIW59127.2023.10388158(1-8)Online publication date: 10-Sep-2023
  • (2022)Quality-Aware Bag of Modulation Spectrum Features for Robust Speech Emotion RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2022.318822313:4(1892-1905)Online publication date: 1-Oct-2022
  • (2022)Branch-Fusion-Net for Multi-Modal Continuous Dimensional Emotion RecognitionIEEE Signal Processing Letters10.1109/LSP.2022.316037329(942-946)Online publication date: 2022
  • (2022)Modularized composite attention network for continuous music emotion recognitionMultimedia Tools and Applications10.1007/s11042-022-13577-682:5(7319-7341)Online publication date: 19-Aug-2022
  • (2022)Applied Affective ComputingundefinedOnline publication date: 25-Jan-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media