skip to main content
10.1145/3136755.3143010acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Group emotion recognition in the wild by combining deep neural networks for facial expression classification and scene-context analysis

Published: 03 November 2017 Publication History

Abstract

This paper presents the implementation details of a proposed solution to the Emotion Recognition in the Wild 2017 Challenge, in the category of group-level emotion recognition. The objective of this sub-challenge is to classify a group's emotion as Positive, Neutral or Negative. Our proposed approach incorporates both image context and facial information extracted from an image for classification. We use Convolutional Neural Networks (CNNs) to predict facial emotions from detected faces present in an image. Predicted facial emotions are combined with scene-context information extracted by another CNN using fully connected neural network layers. Various techniques are explored by combining and training these two Deep Neural Network models in order to perform group-level emotion recognition. We evaluate our approach on the Group Affective Database 2.0 provided with the challenge. Experimental evaluations show promising performance improvements, resulting in approximately 37% improvement over the competition's baseline model on the validation dataset.

References

[1]
Ahmed Bilal Ashraf, Simon Lucey, Jeffrey F Cohn, Tsuhan Chen, Zara Ambadar, Kenneth M Prkachin, and Patricia E Solomon. 2009. The Painful Face–pain Expression Recognition Using Active Appearance Models. Image and Vision Computing 27, 12 (2009), 1788–1796.
[2]
Sigal G Barsade and Donald E Gibson. 1998. Group Emotion: A View From Top and Bottom. Research on Managing Groups And Teams 1, 4 (1998), 81–102.
[3]
Aleksandra Cerekovic. 2016. A Deep Look into Group Happiness Prediction from Images. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 437–444.
[4]
François Chollet. 2016. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv preprint arXiv:1610.02357 (2016).
[5]
Jeffrey F Cohn, Tomas Simon Kruez, Iain Matthews, Ying Yang, Minh Hoai Nguyen, Margara Tejera Padilla, Feng Zhou, and Fernando De la Torre. 2009.
[6]
Detecting Depression From Facial Actions and Vocal Prosody. In 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. IEEE, 1–7.
[7]
Abhinav Dhall, Roland Goecke, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. 2016. Emotiw 2016: Video and Group-Level Emotion Recognition Challenges. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 427–432.
[8]
Abhinav Dhall, Jyoti Joshi, Karan Sikka, Roland Goecke, and Nicu Sebe. 2015. The More the Merrier: Analysing the Affect of a Group of People in Images. In Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, Vol. 1. IEEE, 1–8.
[9]
Paul Ekman and Wallace V Friesen. 1976. Measuring Facial Movement. Environmental Psychology and Nonverbal Behavior 1, 1 (1976), 56–75.
[10]
Rana El Kaliouby and Peter Robinson. 2005. Real-Time Inference of Complex Mental States From Facial Expressions and Head Gestures. In Real-Time Vision for Human-Computer Interaction. Springer, 181–200.
[11]
Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, et al. 2013. Challenges in Representation Learning: A Report on Three Machine Learning Contests. In International Conference on Neural Information Processing. Springer, 117–124.
[12]
Ralph Gross, Iain Matthews, Jeffrey Cohn, Takeo Kanade, and Simon Baker. 2010. Multi-pie. Image and Vision Computing 28, 5 (2010), 807–813.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR abs/1502.01852 (2015).
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[15]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861 (2017).
[16]
Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al. 2016. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors. arXiv preprint arXiv:1611.10012 (2016).
[17]
Bo-Kyeong Kim, Hwaran Lee, Jihyeon Roh, and Soo-Young Lee. 2015. Hierarchical Committee of Deep Cnns with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, 427–434. ICMI’17, November 13–17, 2017, Glasgow, UK Asad Abbas and Stephan K. Chalup
[18]
Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014).
[19]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet Classification With Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS 2012), Vol. 25. 1097–1105.
[20]
Jianshu Li, Sujoy Roy, Jiashi Feng, and Terence Sim. 2016. Happiness Level Prediction with Sequential Inputs via Multiple Regressions. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 487–493.
[21]
Patrick Lucey, Jeffrey F Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. 2010. The Extended Cohn-Kanade Dataset (ck+): A Complete Dataset for Action Unit and Emotion-Specified Expression. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 94–101.
[22]
Patrick Lucey, Jeffrey F Cohn, Kenneth M Prkachin, Patricia E Solomon, and Iain Matthews. 2011. Painful data: The UNBC-McMaster shoulder pain expression archive database. In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on. IEEE, 57–64.
[23]
C Mayer, M Eggers, and B Radig. 2014. Cross-Database Evaluation for Facial Expression Recognition. Pattern Recognition and Image Analysis 24, 1 (2014), 124–132.
[24]
Daniel McDuff, Rana El Kaliouby, Karim Kassam, and Rosalind Picard. 2010. Affect Valence Inference From Facial Action Unit Spectrograms. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 17–24.
[25]
D. McDuff, R. el Kaliouby, T. Senechal, M. Amr, J. F. Cohn, and R. Picard. 2013. Affectiva-MIT Facial Expression Dataset (AM-FED): Naturalistic and Spontaneous Facial Expressions Collected In-the-Wild. In 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 881–888.
[26]
Rosalind W Picard and Roalind Picard. 1997. Affective Computing. Vol. 252. MIT Press, Cambridge.
[27]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.
[28]
Haşim Sak, Andrew Senior, and Françoise Beaufays. 2014. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. In Fifteenth Annual Conference of the International Speech Communication Association.
[29]
Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2015. Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 6 (2015), 1113–1133.
[30]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014).
[31]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. CoRR abs/1512.00567 (2015).
[32]
Vassilios Vonikakis, Yasin Yazici, Viet Dung Nguyen, and Stefan Winkler. 2016. Group Happiness Assessment Using Geometric Features and Dataset Balancing. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 479–486.
[33]
Jacob Whitehill, Gwen Littlewort, Ian Fasel, Marian Bartlett, and Javier Movellan. 2009. Toward Practical Smile Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 11 (2009), 2106–2111.
[34]
Jianxin Wu and Jim M Rehg. 2011. CENTRIST: A Visual Descriptor for Scene Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 8 (2011), 1489–1501.
[35]
Zhiding Yu and Cha Zhang. 2015. Image Based Static Facial Expression Recognition with Multiple Deep Network Learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, 435–442.
[36]
Zhihong Zeng, Maja Pantic, Glenn I Roisman, and Thomas S Huang. 2009. A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 1 (2009), 39–58.
[37]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).
[38]
Xiangxin Zhu and Deva Ramanan. 2012. Face Detection, Pose Estimation, and Landmark Localization in the Wild. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2879–2886.

Cited By

View all
  • (2024)Implementing the Affective Mechanism for Group Emotion Recognition With a New Graph Convolutional Network ArchitectureIEEE Transactions on Affective Computing10.1109/TAFFC.2023.332010115:3(1104-1115)Online publication date: Jul-2024
  • (2023)EmotiW 2023: Emotion Recognition in the Wild ChallengeProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616545(746-749)Online publication date: 9-Oct-2023
  • (2023)A Self-Fusion Network Based on Contrastive Learning for Group Emotion RecognitionIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.320224910:2(458-469)Online publication date: Apr-2023
  • Show More Cited By

Index Terms

  1. Group emotion recognition in the wild by combining deep neural networks for facial expression classification and scene-context analysis

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction
        November 2017
        676 pages
        ISBN:9781450355438
        DOI:10.1145/3136755
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 03 November 2017

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Affective Computing
        2. Deep Neural Networks
        3. Facial Emotion Recognition
        4. Group Emotion Recognition

        Qualifiers

        • Research-article

        Conference

        ICMI '17
        Sponsor:

        Acceptance Rates

        ICMI '17 Paper Acceptance Rate 65 of 149 submissions, 44%;
        Overall Acceptance Rate 453 of 1,080 submissions, 42%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)22
        • Downloads (Last 6 weeks)2
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Implementing the Affective Mechanism for Group Emotion Recognition With a New Graph Convolutional Network ArchitectureIEEE Transactions on Affective Computing10.1109/TAFFC.2023.332010115:3(1104-1115)Online publication date: Jul-2024
        • (2023)EmotiW 2023: Emotion Recognition in the Wild ChallengeProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616545(746-749)Online publication date: 9-Oct-2023
        • (2023)A Self-Fusion Network Based on Contrastive Learning for Group Emotion RecognitionIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.320224910:2(458-469)Online publication date: Apr-2023
        • (2023)Audio-Visual Automatic Group Affect AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2021.310417014:2(1056-1069)Online publication date: 1-Apr-2023
        • (2023)Automatic Emotion Recognition for Groups: A ReviewIEEE Transactions on Affective Computing10.1109/TAFFC.2021.306572614:1(89-107)Online publication date: 1-Jan-2023
        • (2023)Cohesive Group Emotion Recognition using Deep Learning2023 26th ACIS International Winter Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD-Winter)10.1109/SNPD-Winter57765.2023.10466291(264-269)Online publication date: 14-Dec-2023
        • (2023)Cohesive Group Emotion Recognition using Deep Learning2023 IEEE/ACIS 8th International Conference on Big Data, Cloud Computing, and Data Science (BCD)10.1109/BCD57833.2023.10466291(264-269)Online publication date: 14-Dec-2023
        • (2023)Social Event Context and Affect Prediction in Group Videos2023 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)10.1109/ACIIW59127.2023.10388162(1-8)Online publication date: 10-Sep-2023
        • (2023)A recent survey on perceived group sentiment analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10398897(103988)Online publication date: Dec-2023
        • (2023)Facial Emotion Recognition in-the-Wild Using Deep Neural Networks: A Comprehensive ReviewSN Computer Science10.1007/s42979-023-02423-75:1Online publication date: 13-Dec-2023
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media