short-paper

Toward Mathematical Representation of Emotion: A Deep Multitask Learning Method Based On Multimodal Recognition

Authors:

Seiichi Harata,

Shohei KatoAuthors Info & Claims

ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction

Pages 47 - 51

https://doi.org/10.1145/3395035.3425254

Published: 27 December 2020 Publication History

Abstract

To emulate human emotions in agents, the mathematical representation of emotion (an emotional space) is essential for each component, such as emotion recognition, generation, and expression. In this study, we aim to acquire a modality-independent emotional space by extracting shared emotional information from different modalities. We propose a method of acquiring an emotional space by integrating multimodalities on a DNN and combining the emotion recognition task and the unification task. The emotion recognition task learns the representation of emotions, and the unification task learns an identical emotional space from each modality. Through the experiments with audio-visual data, we confirmed that there are differences in emotional spaces acquired from unimodality, and the proposed method can acquire a joint emotional space. We also indicated that the proposed method could adequately represent emotions in a low-dimensional emotional space, such as in five or six dimensions, under this paper's experimental conditions.

References

[1]

G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000). https://github.com/itseez/opencv

[2]

Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008. IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation, Vol. 42, 4 (05 Nov 2008), 335. https://doi.org/10.1007/s10579-008--9076--6

[3]

Chie Hieida and Takato Horii and Takayuki Nagai. 2018. Emotion Differentiation based on Decision-Making in Emotion Model. In 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). 659--665.

Digital Library

[4]

Corentin Kervadec and Valentin Vielzeuf and Stéphane Pateux and Alexis Lechervy and Frédéric Jurie. 2018. CAKE: Compact and Accurate K-dimensional representation of Emotion. In Image Analysis for Human Facial and Activity Recognition (BMVC Workshop). Newcastle, United Kingdom.

[5]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .

[6]

Paul Ekman and Wallace V Friesen. 2003. Unmasking the face: A guide to recognizing emotions from facial clues. Ishk.

[7]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27. Curran Associates, Inc., 2672--2680.

Digital Library

[8]

Minori Gotoh, Masayoshi Kanoh, Shohei Kato, Tsutomu Kunitachi, and Hidenori Itoh. 2005. Face Generator for Sensibility Robot based on Emotional Regions. In Proceedings of the 36th International Symposium on Robotics (ISR 2005, Vol. 36). Citeseer, Tokyo, Japan, WE31--6.

[9]

Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science, Vol. 313, 5786 (2006), 504--507. https://doi.org/10.1126/science.1127647

[10]

Leland McInnes and John Healy and James Melville. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints (Feb 2018). arxiv: 1802.03426 [stat.ML]

[11]

Steven R. Livingstone and Frank A. Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE, Vol. 13, 5 (May 2018), 1--35. https://doi.org/10.1371/journal.pone.0196391

[12]

Mahmut Kaya and Hasan ^^c5^^9eakir Bilge. 2019. Deep Metric Learning: A Survey. Symmetry, Vol. 11, 9 (Aug 2019), 1066. https://doi.org/10.3390/sym11091066

[13]

Marc O. Ernst and Martin S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature, Vol. 415, 6870 (2002), 429--433. https://doi.org/10.1038/415429a

[14]

Albert Mehrabian and James A Russell. 1974. An approach to environmental psychology. the MIT Press.

[15]

Michael S. Landy and Laurence T. Maloney and Elizabeth B. Johnston and Mark Young. 1995. Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Research, Vol. 35, 3 (1995), 389 -- 412. https://doi.org/10.1016/0042--6989(94)00176-M

[16]

Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor. 2019. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Transactions on Affective Computing, Vol. 10, 1 (Jan 2019), 18--31. https://doi.org/10.1109/TAFFC.2017.2740923

Digital Library

[17]

James A Russell. 1980. A Circumplex Model of Affect. Journal of Personality and Social Psychology, Vol. 39 (Dec 1980), 1161--1178. https://doi.org/10.1037/h0077714

[18]

Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, and Thierry Dutoit. 2019. Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis Through Audio Analysis. In Proceedings of the 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019). 4475--4479. https://doi.org/10.21437/Interspeech.2019--1426

[19]

Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, Chunghyun Ahn, and Hong-Goo Kang. 2020. Emotional Speech Synthesis with Rich and Granularized Control. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7254--7258. https://doi.org/10.1109/ICASSP40776.2020.9053732

[20]

Valentin Vielzeuf and Corentin Kervadec and Stéphane Pateux and Frédéric Jurie. 2019. The Many Variations of Emotion. In 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG 2019). 1--7. https://doi.org/10.1109/FG.2019.8756560

Digital Library

[21]

Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. CosFace: Large Margin Cosine Loss for Deep Face Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Cited By

Harata SSakuma TKato S(2022)Audio-Visual Shared Emotion Representation for Robust Emotion Recognition on Modality Missing Using Hemi-hyperspherical Embedding and Latent Space UnificationHCI International 2022 Posters10.1007/978-3-031-06388-6_18(137-143)Online publication date: 16-Jun-2022
https://doi.org/10.1007/978-3-031-06388-6_18

Index Terms

Toward Mathematical Representation of Emotion: A Deep Multitask Learning Method Based On Multimodal Recognition

Index terms have been assigned to the content through auto-classification.

Recommendations

Speech Emotion Recognition among Elderly Individuals using Multimodal Fusion and Transfer Learning
ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction

Recognizing the emotions of the elderly is important as it could give an insight into their mental health. Emotion recognition systems that work well on the elderly could be used to assess their emotions in places such as nursing homes and could inform ...
Emotion Recognition Using Physiological Signals
MIDI '15: Proceedings of the Mulitimedia, Interaction, Design and Innnovation

In this paper the problem of emotion recognition using physiological signals is presented. Firstly the problems with acquisition of physiological signals related to specific human emotions are described. It is not a trivial problem to elicit real ...
Group emotion recognition in the wild by combining deep neural networks for facial expression classification and scene-context analysis
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

This paper presents the implementation details of a proposed solution to the Emotion Recognition in the Wild 2017 Challenge, in the category of group-level emotion recognition. The objective of this sub-challenge is to classify a group's emotion as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction

October 2020

548 pages

ISBN:9781450380027

DOI:10.1145/3395035

General Chairs:
Khiet Truong
University of Twente, the Netherlands
,
Dirk Heylen
University of Twente, the Netherlands
,
Mary Czerwinski
Microsoft Research, USA
,
Program Chairs:
Nadia Berthouze
University College London, United Kingdom
,
Mohamed Chetouani
Sorbonne University, France
,
Mikio Nakano
C4A Research Institute, Japan

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

The Ministry of Education, Culture, Sports, Science and Technology-Japan, Grant--in--Aid for Scientific Research

Conference

ICMI '20

Sponsor:

SIGCHI

ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 25 - 29, 2020

Virtual Event, Netherlands

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
100
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Harata SSakuma TKato S(2022)Audio-Visual Shared Emotion Representation for Robust Emotion Recognition on Modality Missing Using Hemi-hyperspherical Embedding and Latent Space UnificationHCI International 2022 Posters10.1007/978-3-031-06388-6_18(137-143)Online publication date: 16-Jun-2022
https://doi.org/10.1007/978-3-031-06388-6_18

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten