skip to main content
10.1145/3581783.3613784acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Efficient Labelling of Affective Video Datasets via Few-Shot & Multi-Task Contrastive Learning

Published: 27 October 2023 Publication History

Abstract

Whilst deep learning techniques have achieved excellent emotion prediction, they still require large amounts of labelled training data, which are (a) onerous and tedious to compile, and (b) prone to errors and biases. We propose Multi-Task Contrastive Learning for Affect Representation (MT-CLAR) for few-shot affect inference. MT-CLAR combines multi-task learning with a Siamese network trained via contrastive learning to infer from a pair of expressive facial images (a) the (dis)similarity between the facial expressions, and (b) the difference in valence and arousal levels of the two faces. We further extend the image-based MT-CLAR framework for automated video labelling where, given one or a few labelled video frames (termed support-set), MT-CLAR labels the remainder of the video for valence and arousal. Experiments are performed on the AFEW-VA dataset with multiple support-set configurations; moreover, supervised learning on representations learnt via MT-CLAR are used for valence, arousal and categorical emotion prediction on the AffectNet and AFEW-VA datasets. The results show that valence and arousal predictions via MT-CLAR are very comparable to the state-of-the-art (SOTA), and we significantly outperform SOTA with a support-set ≈6% the size of the video dataset.

References

[1]
Babak Joze Abbaschian, Daniel Sierra-Sosa, and Adel Elmaghraby. 2021. Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models. Sensors, Vol. 21, 4 (2021). https://doi.org/10.3390/s21041249
[2]
Youngdo Ahn, Sung Joo Lee, and Jong Won Shin. 2021. Cross-corpus speech emotion recognition based on few-shot learning and domain adaptation. IEEE Signal Processing Letters, Vol. 28 (2021), 1190--1194.
[3]
Abdallah El Ali, Torben Wallbaum, Merlin Wasmann, Wilko Heuten, and Susanne Boll. 2017. Face2Emoji: Using Facial Emotional Expressions to Filter Emojis. In Conference on Human Factors in Computing Systems. ACM, 1577--1584.
[4]
Maneesh Bilalpur, Seyed Mostafa Kia, Manisha Chawla, Tat-Seng Chua, and Ramanathan Subramanian. 2017. Gender and Emotion Recognition with Implicit User Signals. In ACM Int'l Conference on Multimodal Interaction. 379--387. https://doi.org/10.1145/3136755.3136790
[5]
Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017). 1021--1030. https://doi.org/10.1109/ICCV.2017.116
[6]
Chris Careaga, Brian Hutchinson, Nathan Hodas, and Lawrence Phillips. 2019. Metric-based few-shot learning for video action recognition. arXiv preprint arXiv:1909.09602 (2019).
[7]
R Caruana. 1993. Multitask Learning: A Knowledge-Based Source of Inductive Bias. In Proceedings of the Tenth International Conference on Machine Learning. San Francisco, CA, USA, 41--48.
[8]
Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal Multi-Task Learning for Dimensional and Continuous Emotion Recognition. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. Mountain View, CA, USA, 19--26.
[9]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning (ICML '20), Vol. 119. PMLR, 1597--1607.
[10]
Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Vol. 1. IEEE, 539--546.
[11]
Anca-Nicoleta Ciubotaru, Arnout Devos, Behzad Bozorgtabar, Jean-Philippe Thiran, and Maria Gabrani. 2019. Revisiting few-shot learning for facial expression recognition. arXiv preprint arXiv:1912.02751 (2019).
[12]
Abhinav Dhall, Roland Goecke, Simon Lucey, and Tom Gedeon. 2012. Collecting Large, Richly Annotated Facial-Expression Databases from Movies. IEEE Multimedia, Vol. 19, 3 (2012), 34--41. https://doi.org/10.1109/MMUL.2012.26
[13]
Kexin Feng and Theodora Chaspari. 2021. Few-Shot Learning in Emotion Recognition of Spontaneous Speech Using a Siamese Neural Network With Adaptive Sample Pair Formation. IEEE Transactions on Affective Computing, Vol. 14, 2 (2021), 1627--1633. https://doi.org/10.1109/TAFFC.2021.3109485
[14]
Xavier Gastaldi. 2017. Shake-shake regularization. arXiv preprint arXiv:1705.07485 (2017).
[15]
Maria Gendron, Carlos Crivelli, and Lisa Feldman Barrett. 2018. Universality reconsidered: Diversity in making meaning of facial expressions. Current Directions in Psychological Science, Vol. 27, 4 (2018), 211--219.
[16]
Sebastian Handrich, Laslo Dinges, Ayoub Al-Hamadi, Philipp Werner, and Zaher Al Aghbari. 2020. Simultaneous prediction of valence/arousal and emotions on AffectNet, Aff-Wild and AFEW-VA. Procedia Computer Science, Vol. 170 (2020), 634--641.
[17]
Wassan Hayale, Pooran Negi, and Mohammad Mahoor. 2019. Facial Expression Recognition Using Deep Siamese Neural Networks with a Supervised Loss function. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019) (Lille, France). IEEE, 1--7. https://doi.org/10.1109/FG.2019.8756571
[18]
Wassan Hayale, Pooran Singh Negi, and Mohammad Mahoor. 2021. Deep Siamese Neural Networks for Facial Expression Recognition in the Wild. IEEE Transactions on Affective Computing, Vol. 14, 2 (2021), 1148--1158. https://doi.org/10.1109/TAFFC.2021.3077248
[19]
Nathan Hilliard, Lawrence Phillips, Scott Howland, Artëm Yankov, Courtney D Corley, and Nathan O Hodas. 2018. Few-shot learning with metric-agnostic conditional embeddings. arXiv preprint arXiv:1802.04376 (2018).
[20]
Sepp Hochreiter, A Steven Younger, and Peter R Conwell. 2001. Learning to Learn Using Gradient Descent. In Artificial Neural Networks - ICANN 2001 (Vienna, Austria). Springer, 87--94.
[21]
Zhaocheng Huang, Ting Dang, Nicholas Cummins, Brian Stasak, Phu Le, Vidhyasaharan Sethu, and Julien Epps. 2015. An Investigation of Annotation Delay Compensation and Output-Associative Fusion for Multimodal Continuous Emotion Prediction. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge (Brisbane, Australia) (AVEC '15). Association for Computing Machinery, New York, NY, USA, 41--48. https://doi.org/10.1145/2808196.2811640
[22]
Youngkyoon Jang, Hatice Gunes, and Ioannis Patras. 2019. Registration-free Face-SSD: Single shot analysis of smiles, facial attributes, and affect in the wild. Computer Vision and Image Understanding, Vol. 182 (2019), 17--29. https://doi.org/10.1016/j.cviu.2019.01.006
[23]
Euiseok Jeong, Geesung Oh, and Sejoon Lim. 2022. Multi-Task Learning for Human Affect Prediction With Auditory-Visual Synchronized Representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2438--2445.
[24]
Longlong Jing and Yingli Tian. 2020. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 43, 11 (2020), 4037--4058. https://doi.org/10.1109/TPAMI.2020.2992393
[25]
Daeha Kim and Byung Cheol Song. 2022. Emotion-Aware Multi-View Contrastive Learning for Facial Emotion Recognition. In Computer Vision - ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XIII (Tel Aviv, Israel). Springer-Verlag, Berlin, Heidelberg, 178--195. https://doi.org/10.1007/978-3-031-19778-9_11
[26]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[27]
Dimitrios Kollias, Shiyang Cheng, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. Deep Neural Network Augmentation: Generating Faces for Affect Analysis. International Journal of Computer Vision, Vol. 128 (Feb 2020), 1455--1484. https://doi.org/10.1007/s11263-020-01304--3
[28]
Dimitrios Kollias, Panagiotis Tzirakis, Mihalis A Nicolaou, Athanasios Papaioannou, Guoying Zhao, Björn Schuller, Irene Kotsia, and Stefanos Zafeiriou. 2019. Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond. International Journal of Computer Vision, Vol. 127, 6-7 (2019), 907--929. https://doi.org/10.1007/s11263-019-01158-4
[29]
Jean Kossaifi, Antoine Toisoul, Adrian Bulat, Yannis Panagakis, Timothy M. Hospedales, and Maja Pantic. 2020. Factorized Higher-Order CNNs With an Application to Spatio-Temporal Emotion Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6059--6068. https://doi.org/10.1109/CVPR42600.2020.00610
[30]
Jean Kossaifi, Georgios Tzimiropoulos, Sinisa Todorovic, and Maja Pantic. 2017. AFEW-VA database for valence and arousal estimation in-the-wild. Image and Vision Computing, Vol. 65 (2017), 23--36.
[31]
Shan Li and Weihong Deng. 2022. Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing, Vol. 13, 3 (2022), 1195--1215. https://doi.org/10.1109/TAFFC.2020.2981446
[32]
Zheng Lian, Ya Li, Jianhua Tao, and Jian Huang. 2018. Speech emotion recognition via contrastive loss under siamese networks. In Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and First Multi-Modal Affective Computing of Large-Scale Multimedia Data (Seoul, Republic of Korea). 21--26. https://doi.org/10.1145/3267935.3267946
[33]
Daizong Liu, Xi Ouyang, Shuangjie Xu, Pan Zhou, Kun He, and Shiping Wen. 2020. SAANet: Siamese action-units attention network for improving dynamic facial expression recognition. Neurocomputing, Vol. 413 (Nov 2020), 145--157.
[34]
Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-Margin Softmax Loss for Convolutional Neural Networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (New York, NY, USA) (ICML'16). JMLR.org, 507--516.
[35]
Joseph A Mikels, Barbara L Fredrickson, Gregory R Larkin, Casey M Lindberg, Sam J Maglio, and Patricia A Reuter-Lorenz. 2005. Emotional category data on images from the International Affective Picture System. Behavior Research Methods, Vol. 37, 4 (2005), 626--630.
[36]
Anna Mitenkova, Jean Kossaifi, Yannis Panagakis, and Maja Pantic. 2019. Valence and Arousal Estimation In-The-Wild with Tensor Methods. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). 1--7. https://doi.org/10.1109/FG.2019.8756619
[37]
Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor. 2019. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Transactions on Affective Computing, Vol. 10, 1 (2019), 18--31. https://doi.org/10.1109/TAFFC.2017.2740923
[38]
Soujanya Narayana, Ibrahim Radwan, Ravikiran Parameshwara, Iman Abbasnejad, Akshay Asthana, Ramanathan Subramanian, and Roland Goecke. 2023. A Weakly Supervised Approach to Emotion-change Prediction and Improved Mood Inference. arXiv preprint arXiv:2306.06979 (2023).
[39]
Soujanya Narayana, Ramanathan Subramanian, Ibrahim Radwan, and Roland Goecke. 2022. To Improve Is to Change: Towards Improving Mood Prediction by Learning Changes in Emotion. In Companion Publication of the 2022 International Conference on Multimodal Interaction (Bengaluru, India) (ICMI '22 Companion). Association for Computing Machinery, New York, NY, USA, 36--41. https://doi.org/10.1145/3536220.3563685
[40]
Pankaj Pandey, Gulshan Sharma, Krishna. P. Miyapuram, Ramanathan Subramanian, and Derek Lomas. 2022. Music Identification Using Brain Responses to Initial Snippets. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1246--1250. https://doi.org/10.1109/ICASSP43922.2022.9747332
[41]
Ravikiran Parameshwara, Ibrahim Radwan, Ramanathan Subramanian, and Roland Goecke. 2023. Examining Subject-Dependent and Subject-Independent Human Affect Inference from Limited Video Data. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 1--6. https://doi.org/10.1109/FG57933.2023.10042798
[42]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems (Vancouver, Canada), H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 8026--8037.
[43]
Tomas Pfister, James Charles, and Andrew Zisserman. 2014. Domain-adaptive discriminative one-shot learning of gestures. In Computer Vision-ECCV 2014: 13th European Conference, Part VI 13 (Zurich, Switzerland) (Lecture Notes in Computer Science, Vol. 8694). Springer, 814--829. https://doi.org/10.1007/978-3-319-10599-4_52
[44]
Fan Qi, Xiaoshan Yang, and Changsheng Xu. 2021. Emotion Knowledge Driven Video Highlight Detection. IEEE Transactions on Multimedia, Vol. 23 (2021), 3999--4013. https://doi.org/10.1109/TMM.2020.3035285
[45]
Anoop Kolar Rajagopal, Subramanian Ramanathan, Elisa Ricci, Radu L. Vieriu, Oswald Lanz, Kalpathi Ramakrishnan, and Nicu Sebe. 2014. Exploring Transfer Learning Approaches for Head Pose Classification from Multi-view Surveillance Images. International Journal of Computuer Vision, Vol. 109, 1--2 (2014), 146--167.
[46]
Shuvendu Roy and Ali Etemad. 2021. Spatiotemporal Contrastive Learning of Facial Expressions in Videos. In 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1--8. https://doi.org/10.1109/ACII52823.2021.9597460
[47]
Xinke Shen, Xianggen Liu, Xin Hu, Dan Zhang, and Sen Song. 2022. Contrastive Learning of Subject-Invariant EEG Representations for Cross-Subject Emotion Recognition. IEEE Transactions on Affective Computing (2022). https://doi.org/10.1109/TAFFC.2022.3164516
[48]
Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Karthik Yadati, Mohan Kankanhalli, and Ramanathan Subramanian. 2017. Affect Recognition in Ads with Application to Computational Advertising. In ACM Int'l Conference on Multimedia. 1148--1156. https://doi.org/10.1145/3123266.3123444
[49]
Tengfei Song, Wenming Zheng, Peng Song, and Zhen Cui. 2020. EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks. IEEE Transactions on Affective Computing, Vol. 11, 3 (2020), 532--541. https://doi.org/10.1109/TAFFC.2018.2817622
[50]
Xuran Sun, Jiabei Zeng, and Shiguang Shan. 2021. Emotion-aware Contrastive Learning for Facial Action Unit Detection. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). IEEE, 01--08. https://doi.org/10.1109/FG52635.2021.9666945
[51]
Mani Kumar Tellamekala and Michel Valstar. 2019. Temporally Coherent Visual Representations for Dimensional Affect Recognition. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1--7. https://doi.org/10.1109/ACII.2019.8925529
[52]
Antoine Toisoul, Jean Kossaifi, Adrian Bulat, Georgios Tzimiropoulos, and Maja Pantic. 2021. Estimation of continuous valence and arousal levels from faces in naturalistic conditions. Nature Machine Intelligence, Vol. 3, 1 (2021), 42--50. https://doi.org/10.1038/s42256-020-00280-0
[53]
Ivan Y. Tyukin, Alexander N. Gorban, Muhammad H. Alkhudaydi, and Qinghua Zhou. 2021. Demystification of Few-shot and One-shot Learning. In 2021 International Joint Conference on Neural Networks (IJCNN). 1--7. https://doi.org/10.1109/IJCNN52387.2021.9534395
[54]
Shu-Hui Wang and Chiou-Ting Hsu. 2017. AST-Net: An Attribute-based Siamese Temporal Network for Real-Time Emotion Recognition. In British Machine Vision Conference 2017 (BMVC 2017) (London, UK).
[55]
Yaqing Wang, Quanming Yao, James T. Kwok, and Lionel M. Ni. 2020. Generalizing from a Few Examples: A Survey on Few-Shot Learning. Comput. Surveys, Vol. 53, 3, Article 63 (Jun 2020). https://doi.org/10.1145/3386252
[56]
Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5177--5186. https://doi.org/10.1109/CVPR.2018.00543
[57]
Rui Xia and Yang Liu. 2015. A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space. IEEE Transactions on Affective Computing, Vol. 8, 1 (2015), 3--14. https://doi.org/10.1109/TAFFC.2015.2512598
[58]
Tianyi Zhang, Abdallah El Ali, Alan Hanjalic, and Pablo Cesar. 2022. Few-shot Learning for Fine-grained Emotion Recognition using Physiological Signals. IEEE Transactions on Multimedia (2022). https://doi.org/10.1109/TMM.2022.3165715
[59]
Tenggan Zhang, Chuanhe Liu, Xiaolong Liu, Yuchen Liu, Liyu Meng, Lei Sun, Wenqiang Jiang, Fengyuan Zhang, Jinming Zhao, and Qin Jin. 2023. Multi-Task Learning Framework for Emotion Recognition In-the-Wild. In Computer Vision - ECCV 2022 Workshops. Springer, 143--156. https://doi.org/10.1007/978-3-031-25075-0_11
[60]
Zhilu Zhang and Mert Sabuncu. 2018. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. In Advances in Neural Information Processing Systems (Montréal, Canada) (NIPS'18, Vol. 31), S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 8792--8802. https://proceedings.neurips.cc/paper_files/paper/2018/file/f2925f97bc13ad2852a7a551802feea0-Paper.pdf
[61]
Xinyi Zou, Yan Yan, Jing-Hao Xue, Si Chen, and Hanzi Wang. 2022. When facial expression recognition meets few-shot learning: a joint and alternate learning framework. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22), Vol. 36. 5367--5375.

Cited By

View all
  • (2024)Exploring Electroencephalography-Based Affective Analysis and Detection of Parkinson’s DiseaseIntelligent Computing10.34133/icomputing.00843Online publication date: 17-Oct-2024
  • (2024)MARS: A Multiview Contrastive Approach to Human Activity Recognition From Accelerometer SensorIEEE Sensors Letters10.1109/LSENS.2024.33579418:3(1-4)Online publication date: Mar-2024
  • (2024)MCAN: An Efficient Multi-Task Network for Facial Expression Analysis2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580014(1037-1042)Online publication date: 8-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. arousal
  2. contrastive learning
  3. emotion category
  4. few-shot
  5. multi-task
  6. siamese network
  7. similarity
  8. valence
  9. video labelling

Qualifiers

  • Research-article

Funding Sources

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)73
  • Downloads (Last 6 weeks)9
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring Electroencephalography-Based Affective Analysis and Detection of Parkinson’s DiseaseIntelligent Computing10.34133/icomputing.00843Online publication date: 17-Oct-2024
  • (2024)MARS: A Multiview Contrastive Approach to Human Activity Recognition From Accelerometer SensorIEEE Sensors Letters10.1109/LSENS.2024.33579418:3(1-4)Online publication date: Mar-2024
  • (2024)MCAN: An Efficient Multi-Task Network for Facial Expression Analysis2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580014(1037-1042)Online publication date: 8-May-2024
  • (2024)Neuromorphic valence and arousal estimationJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-024-04885-wOnline publication date: 26-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media