skip to main content
10.1145/3395035.3425254acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Toward Mathematical Representation of Emotion: A Deep Multitask Learning Method Based On Multimodal Recognition

Published: 27 December 2020 Publication History

Abstract

To emulate human emotions in agents, the mathematical representation of emotion (an emotional space) is essential for each component, such as emotion recognition, generation, and expression. In this study, we aim to acquire a modality-independent emotional space by extracting shared emotional information from different modalities. We propose a method of acquiring an emotional space by integrating multimodalities on a DNN and combining the emotion recognition task and the unification task. The emotion recognition task learns the representation of emotions, and the unification task learns an identical emotional space from each modality. Through the experiments with audio-visual data, we confirmed that there are differences in emotional spaces acquired from unimodality, and the proposed method can acquire a joint emotional space. We also indicated that the proposed method could adequately represent emotions in a low-dimensional emotional space, such as in five or six dimensions, under this paper's experimental conditions.

References

[1]
G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000). https://github.com/itseez/opencv
[2]
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008. IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation, Vol. 42, 4 (05 Nov 2008), 335. https://doi.org/10.1007/s10579-008--9076--6
[3]
Chie Hieida and Takato Horii and Takayuki Nagai. 2018. Emotion Differentiation based on Decision-Making in Emotion Model. In 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). 659--665.
[4]
Corentin Kervadec and Valentin Vielzeuf and Stéphane Pateux and Alexis Lechervy and Frédéric Jurie. 2018. CAKE: Compact and Accurate K-dimensional representation of Emotion. In Image Analysis for Human Facial and Activity Recognition (BMVC Workshop). Newcastle, United Kingdom.
[5]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[6]
Paul Ekman and Wallace V Friesen. 2003. Unmasking the face: A guide to recognizing emotions from facial clues. Ishk.
[7]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27. Curran Associates, Inc., 2672--2680.
[8]
Minori Gotoh, Masayoshi Kanoh, Shohei Kato, Tsutomu Kunitachi, and Hidenori Itoh. 2005. Face Generator for Sensibility Robot based on Emotional Regions. In Proceedings of the 36th International Symposium on Robotics (ISR 2005, Vol. 36). Citeseer, Tokyo, Japan, WE31--6.
[9]
Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science, Vol. 313, 5786 (2006), 504--507. https://doi.org/10.1126/science.1127647
[10]
Leland McInnes and John Healy and James Melville. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints (Feb 2018). arxiv: 1802.03426 [stat.ML]
[11]
Steven R. Livingstone and Frank A. Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE, Vol. 13, 5 (May 2018), 1--35. https://doi.org/10.1371/journal.pone.0196391
[12]
Mahmut Kaya and Hasan ^^c5^^9eakir Bilge. 2019. Deep Metric Learning: A Survey. Symmetry, Vol. 11, 9 (Aug 2019), 1066. https://doi.org/10.3390/sym11091066
[13]
Marc O. Ernst and Martin S. Banks. 2002. Humans integrate visual and haptic information in a statistically optimal fashion. Nature, Vol. 415, 6870 (2002), 429--433. https://doi.org/10.1038/415429a
[14]
Albert Mehrabian and James A Russell. 1974. An approach to environmental psychology. the MIT Press.
[15]
Michael S. Landy and Laurence T. Maloney and Elizabeth B. Johnston and Mark Young. 1995. Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Research, Vol. 35, 3 (1995), 389 -- 412. https://doi.org/10.1016/0042--6989(94)00176-M
[16]
Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor. 2019. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Transactions on Affective Computing, Vol. 10, 1 (Jan 2019), 18--31. https://doi.org/10.1109/TAFFC.2017.2740923
[17]
James A Russell. 1980. A Circumplex Model of Affect. Journal of Personality and Social Psychology, Vol. 39 (Dec 1980), 1161--1178. https://doi.org/10.1037/h0077714
[18]
Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, and Thierry Dutoit. 2019. Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis Through Audio Analysis. In Proceedings of the 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019). 4475--4479. https://doi.org/10.21437/Interspeech.2019--1426
[19]
Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, Chunghyun Ahn, and Hong-Goo Kang. 2020. Emotional Speech Synthesis with Rich and Granularized Control. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7254--7258. https://doi.org/10.1109/ICASSP40776.2020.9053732
[20]
Valentin Vielzeuf and Corentin Kervadec and Stéphane Pateux and Frédéric Jurie. 2019. The Many Variations of Emotion. In 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG 2019). 1--7. https://doi.org/10.1109/FG.2019.8756560
[21]
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. CosFace: Large Margin Cosine Loss for Deep Face Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Cited By

View all
  • (2022)Audio-Visual Shared Emotion Representation for Robust Emotion Recognition on Modality Missing Using Hemi-hyperspherical Embedding and Latent Space UnificationHCI International 2022 Posters10.1007/978-3-031-06388-6_18(137-143)Online publication date: 16-Jun-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction
October 2020
548 pages
ISBN:9781450380027
DOI:10.1145/3395035
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. affective computing
  2. deep neural networks
  3. emotional space
  4. multimodal fusion
  5. multitask learning

Qualifiers

  • Short-paper

Funding Sources

  • The Ministry of Education, Culture, Sports, Science and Technology-Japan, Grant--in--Aid for Scientific Research

Conference

ICMI '20
Sponsor:
ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
October 25 - 29, 2020
Virtual Event, Netherlands

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Audio-Visual Shared Emotion Representation for Robust Emotion Recognition on Modality Missing Using Hemi-hyperspherical Embedding and Latent Space UnificationHCI International 2022 Posters10.1007/978-3-031-06388-6_18(137-143)Online publication date: 16-Jun-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media