skip to main content
10.1145/3536221.3557033acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Multimodal Representations and Assessments of Emotional Fluctuations of Speakers in Call Centers Conversations

Published:07 November 2022Publication History

ABSTRACT

The question I am addressing in my Ph.D. research is multimodal representations and assessments of emotional fluctuations of speakers in call center conversations. Emotion detection in human conversations has attracted the increasing attention of researchers over the last three decades. Various machine learning models have been developed from detecting six basic emotions to more subtle, complex dimensional emotions, demonstrating a promising result. However, in real-life use cases, the complexity of data and the cost of human annotation remain challenging. In my research, I will work on various real-life conversations, by focusing on real-life data processing, emotional data annotation, and multimodal emotion recognition system design, to build robust and ethical automatic emotion recognition systems.

References

  1. Mehmet Berkehan Akçay and Kaya Oğuz. 2020. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication 116 (Jan. 2020), 56–76. https://doi.org/10.1016/j.specom.2019.12.001Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sevegni Odilon Clement Allognon, Alessandro Lameiras Koerich, and Alceu de Souza Britto. 2020. Continuous Emotion Recognition via Deep Convolutional Autoencoder and Support Vector Regressor. 2020 International Joint Conference on Neural Networks (IJCNN) (2020), 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  3. Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems 16 (Nov. 2010), 345–379. https://doi.org/10.1007/s00530-010-0182-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eric Y. Chen, Zhiyun Lu, Hao Xu, Liangliang Cao, Yu Zhang, and James Fan. 2020. A Large Scale Speech Sentiment Corpus. LREC 2020 Proceedings of the 12th Language Resources and Evaluation Conference (May 2020), 6549–6555. https://aclanthology.org/2020.lrec-1.806Google ScholarGoogle Scholar
  5. Watson D, Clark LA, and Tellegen A.1988. Development and validation of brief measures of positive and negative affect: the PANAS scales. J Pers Soc Psychol 54, 6 (June 1988), 1063–70. https://doi.org/10.1037//0022-3514.54.6.1063Google ScholarGoogle Scholar
  6. Laurence Devillers, Christophe Vaudable, and Clément Chastagnol. 2020. Real-life emotion-related states detection in call centers: a cross-corpora study. INTERSPEECH 2010 (Jan. 2020), 2350–2353.Google ScholarGoogle Scholar
  7. Laurence Devillers and Laurence Vidrascu. 2006. Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. INTERSPEECH 2006 - ICSLP (Sept. 2006).Google ScholarGoogle ScholarCross RefCross Ref
  8. Paul Ekman. 1999. Basic emotions. In T. Dalgleish & M. J. Power (Eds.), Handbook of cognition and emotion(pp. 45–60), John Wiley & Sons Ltd. PP (1999). https://doi.org/10.1002/0470013494.ch3Google ScholarGoogle ScholarCross RefCross Ref
  9. Paul Ekman, Wallace V Friesen, and Phoebe Ellsworth. 2013. Emotion in the Human Face: Guidelines for Research and an Integration of Findings. Elsevier Science.Google ScholarGoogle Scholar
  10. Florian Eyben, Klaus R. Scherer, Björn W. Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y. Devillers, Julien Epps, Petri Laukka, Shrikanth S. Narayanan, and Khiet P. Truong. 2016. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Transactions on Affective Computing 7, 2 (2016), 190–202. https://doi.org/10.1109/TAFFC.2015.2457417Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Florian Eyben, Martin Wöllmer, and Björn Schuller. 2012. A multitask approach to continuous five-dimensional affect sensing in natural speech. ACM Trans. Interact. Intell. Syst. 2 (2012), 6:1–6:29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. openSMILE – The Munich Versatile and Fast Open-Source Audio Feature Extractor. MM’10 - Proceedings of the ACM Multimedia 2010 International Conference (Jan. 2010), 1459–1462. https://doi.org/10.1145/1873951.1874246Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Johnny R. J. Fontaine, Klaus R. Scherer, Etienne B. Roesch, and Phoebe C. Ellsworth. 2007. The world of emotions is not two-dimensional. Psychol Sci 18, 12 (Dec. 2007), 1050–7. https://doi.org/10.1111/j.1467-9280.2007.02024.xGoogle ScholarGoogle ScholarCross RefCross Ref
  14. Jeffrey M Girard and Aidan G C Wright. 2018. DARMA: Software for Dual Axis Rating and Media Annotation. Behavior Research Methods 50, 3 (2018), 902–909. https://doi.org/10.3758/s13428-017-0915-5Google ScholarGoogle ScholarCross RefCross Ref
  15. J.J. Godfrey, E.C. Holliman, and J. McDaniel. 1992. SWITCHBOARD: telephone speech corpus for research and development. ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing 1 (March 1992), 517–520 vol.1. https://doi.org/10.1109/ICASSP.1992.225858Google ScholarGoogle Scholar
  16. Bhanu Prakash Reddy Guda, Aparna Garimella, and Niyati Chhaya. 2021. EmpathBERT: A BERT-based Framework for Demographic-aware Empathy Prediction. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 3072–3079. https://doi.org/10.18653/v1/2021.eacl-main.268Google ScholarGoogle Scholar
  17. Jing Han, Zixing Zhang, Zhao Ren, and Björn Schuller. 2019. EmoBed: Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings. IEEE Transactions on Affective Computing PP (07 2019), 1–1. https://doi.org/10.1109/TAFFC.2019.2928297Google ScholarGoogle Scholar
  18. Jing Han, Zixing Zhang, Fabien Ringeval, and Björn Schuller. 2017. Prediction-based learning for continuous emotion recognition in speech. 5005–5009. https://doi.org/10.1109/ICASSP.2017.7953109Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Eva Lieskovská, Maroš Jakubec, Roman Jarina, and Michal Chmulík. 2021. A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism. Electronics 10, 10 (May 2021). https://doi.org/10.3390/electronics10101163Google ScholarGoogle ScholarCross RefCross Ref
  20. Manon Macary, Marie Tahon, Yannick Estève, and Anthony Rousseau. 2020. AlloSat: A New Call Center French Corpus for Satisfaction and Frustration Analysis. Language Resources and Evaluation Conference, LREC 2020 (May 2020). https://hal.archives-ouvertes.fr/hal-02506086Google ScholarGoogle Scholar
  21. Manon Macary, Marie Tahon, Yannick Estève, and Anthony Rousseau. 2021. On the use of Self-supervised Pre-trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition. IEEE Spoken Language Technology Workshop (Jan. 2021). https://hal.archives-ouvertes.fr/hal-03003469Google ScholarGoogle ScholarCross RefCross Ref
  22. Veronika Makarova and Valery Petrushin. 2003. Phonetics of Emotion in Russian Speech. (01 2003).Google ScholarGoogle Scholar
  23. Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, and Benoît Sagot. 2020. CamemBERT: a Tasty French Language Model. Association for Computational Linguistics (July 2020), 7203–7219. https://www.aclweb.org/anthology/2020.acl-main.645Google ScholarGoogle Scholar
  24. Mihalis A. Nicolaou, Hatice Gunes, and Maja Pantic. 2011. Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space. IEEE Transactions on Affective Computing 2 (2011), 92–105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Plutchik. 1980. Emotion, a Psychoevolutionary Synthesis. Harper & Row. https://books.google.fr/books?id=G5t9AAAAMAAJGoogle ScholarGoogle Scholar
  26. Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Nagendra Goel, Mirko Hannemann, Yanmin Qian, Petr Schwarz, and Georg Stemmer. 2011. The kaldi speech recognition toolkit. In In IEEE 2011 workshop.Google ScholarGoogle Scholar
  27. Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, Heysem Kaya, Maximilian Schmitt, Shahin Amiriparian, Nicholas Cummins, Denis Lalanne, Adrien Michaud, Elvan Ciftçi, Hüseyin Güleç, Albert Ali Salah, and Maja Pantic. 2018. AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition. Association for Computing Machinery(2018), 3–13. https://doi.org/10.1145/3266302.3266316Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). 1–8. https://doi.org/10.1109/FG.2013.6553805Google ScholarGoogle ScholarCross RefCross Ref
  29. Maximilian Schmitt, Nicholas Cummins, and Björn Schuller. 2019. Continuous Emotion Recognition in Speech — Do We Need Recurrence?Interspeech 2019 (Sept. 2019), 2808–2812. https://doi.org/10.21437/Interspeech.2019-2710Google ScholarGoogle Scholar
  30. Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli. 2019. wav2vec: Unsupervised pre-training for speech recognition. Proc. of INTERSPEECH(2019), 3465–3469. http://arxiv.org/abs/1904.05862Google ScholarGoogle ScholarCross RefCross Ref
  31. Ting-Wei Sun. 2020. End-to-End Speech Emotion Recognition with Gender Information. IEEE Access PP (Aug. 2020), 1–1. https://doi.org/10.1109/ACCESS.2020.3017462Google ScholarGoogle ScholarCross RefCross Ref
  32. Pedro Javier Ortiz Suárez, Benoît Sagot, and Laurent Romary. 2019. Asynchronous pipeline for processing huge corpora on medium to low resource infrastructures. Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019 (July 2019). https://doi.org/10.14618/ids-pub-9021Google ScholarGoogle Scholar
  33. George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A. Nicolaou, Björn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5200–5204. https://doi.org/10.1109/ICASSP.2016.7472669Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Christophe Vaudable and Laurence Devillers. 2012. Negative emotions detection as an indicator of dialogs quality in call centers. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing (March 2012). https://doi.org/10.1109/ICASSP.2012.6289070Google ScholarGoogle Scholar
  35. Laurence Vidrascu and Laurence Devillers. 2005. Detection of real-life emotions in call centers. INTERSPEECH 2005 (Jan. 2005).Google ScholarGoogle ScholarCross RefCross Ref
  36. Jiamei Wei, Ercheng Pei, Dongmei Jiang, Hichem Sahli, Lei Xie, and Zhong-hua Fu. 2015. Multimodal continuous affect recognition based on LSTM and multiple kernel learning. 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014 (02 2015). https://doi.org/10.1109/APSIPA.2014.7041743Google ScholarGoogle Scholar
  37. Felix Weninger, Fabien Ringeval, Erik Marchi, and Björn Schuller. 2016. Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio. (01 2016).Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction
    November 2022
    830 pages
    ISBN:9781450393904
    DOI:10.1145/3536221

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 7 November 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • short-paper
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate453of1,080submissions,42%
  • Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format