skip to main content
10.1145/3581783.3613784acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Efficient Labelling of Affective Video Datasets via Few-Shot & Multi-Task Contrastive Learning

Published:27 October 2023Publication History

ABSTRACT

Whilst deep learning techniques have achieved excellent emotion prediction, they still require large amounts of labelled training data, which are (a) onerous and tedious to compile, and (b) prone to errors and biases. We propose Multi-Task Contrastive Learning for Affect Representation (MT-CLAR) for few-shot affect inference. MT-CLAR combines multi-task learning with a Siamese network trained via contrastive learning to infer from a pair of expressive facial images (a) the (dis)similarity between the facial expressions, and (b) the difference in valence and arousal levels of the two faces. We further extend the image-based MT-CLAR framework for automated video labelling where, given one or a few labelled video frames (termed support-set), MT-CLAR labels the remainder of the video for valence and arousal. Experiments are performed on the AFEW-VA dataset with multiple support-set configurations; moreover, supervised learning on representations learnt via MT-CLAR are used for valence, arousal and categorical emotion prediction on the AffectNet and AFEW-VA datasets. The results show that valence and arousal predictions via MT-CLAR are very comparable to the state-of-the-art (SOTA), and we significantly outperform SOTA with a support-set ≈6% the size of the video dataset.

References

  1. Babak Joze Abbaschian, Daniel Sierra-Sosa, and Adel Elmaghraby. 2021. Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models. Sensors, Vol. 21, 4 (2021). https://doi.org/10.3390/s21041249Google ScholarGoogle ScholarCross RefCross Ref
  2. Youngdo Ahn, Sung Joo Lee, and Jong Won Shin. 2021. Cross-corpus speech emotion recognition based on few-shot learning and domain adaptation. IEEE Signal Processing Letters, Vol. 28 (2021), 1190--1194.Google ScholarGoogle ScholarCross RefCross Ref
  3. Abdallah El Ali, Torben Wallbaum, Merlin Wasmann, Wilko Heuten, and Susanne Boll. 2017. Face2Emoji: Using Facial Emotional Expressions to Filter Emojis. In Conference on Human Factors in Computing Systems. ACM, 1577--1584.Google ScholarGoogle Scholar
  4. Maneesh Bilalpur, Seyed Mostafa Kia, Manisha Chawla, Tat-Seng Chua, and Ramanathan Subramanian. 2017. Gender and Emotion Recognition with Implicit User Signals. In ACM Int'l Conference on Multimodal Interaction. 379--387. https://doi.org/10.1145/3136755.3136790Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017). 1021--1030. https://doi.org/10.1109/ICCV.2017.116Google ScholarGoogle ScholarCross RefCross Ref
  6. Chris Careaga, Brian Hutchinson, Nathan Hodas, and Lawrence Phillips. 2019. Metric-based few-shot learning for video action recognition. arXiv preprint arXiv:1909.09602 (2019).Google ScholarGoogle Scholar
  7. R Caruana. 1993. Multitask Learning: A Knowledge-Based Source of Inductive Bias. In Proceedings of the Tenth International Conference on Machine Learning. San Francisco, CA, USA, 41--48.Google ScholarGoogle ScholarCross RefCross Ref
  8. Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal Multi-Task Learning for Dimensional and Continuous Emotion Recognition. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. Mountain View, CA, USA, 19--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning (ICML '20), Vol. 119. PMLR, 1597--1607.Google ScholarGoogle Scholar
  10. Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Vol. 1. IEEE, 539--546.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Anca-Nicoleta Ciubotaru, Arnout Devos, Behzad Bozorgtabar, Jean-Philippe Thiran, and Maria Gabrani. 2019. Revisiting few-shot learning for facial expression recognition. arXiv preprint arXiv:1912.02751 (2019).Google ScholarGoogle Scholar
  12. Abhinav Dhall, Roland Goecke, Simon Lucey, and Tom Gedeon. 2012. Collecting Large, Richly Annotated Facial-Expression Databases from Movies. IEEE Multimedia, Vol. 19, 3 (2012), 34--41. https://doi.org/10.1109/MMUL.2012.26Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kexin Feng and Theodora Chaspari. 2021. Few-Shot Learning in Emotion Recognition of Spontaneous Speech Using a Siamese Neural Network With Adaptive Sample Pair Formation. IEEE Transactions on Affective Computing, Vol. 14, 2 (2021), 1627--1633. https://doi.org/10.1109/TAFFC.2021.3109485Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xavier Gastaldi. 2017. Shake-shake regularization. arXiv preprint arXiv:1705.07485 (2017).Google ScholarGoogle Scholar
  15. Maria Gendron, Carlos Crivelli, and Lisa Feldman Barrett. 2018. Universality reconsidered: Diversity in making meaning of facial expressions. Current Directions in Psychological Science, Vol. 27, 4 (2018), 211--219.Google ScholarGoogle ScholarCross RefCross Ref
  16. Sebastian Handrich, Laslo Dinges, Ayoub Al-Hamadi, Philipp Werner, and Zaher Al Aghbari. 2020. Simultaneous prediction of valence/arousal and emotions on AffectNet, Aff-Wild and AFEW-VA. Procedia Computer Science, Vol. 170 (2020), 634--641.Google ScholarGoogle ScholarCross RefCross Ref
  17. Wassan Hayale, Pooran Negi, and Mohammad Mahoor. 2019. Facial Expression Recognition Using Deep Siamese Neural Networks with a Supervised Loss function. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019) (Lille, France). IEEE, 1--7. https://doi.org/10.1109/FG.2019.8756571Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Wassan Hayale, Pooran Singh Negi, and Mohammad Mahoor. 2021. Deep Siamese Neural Networks for Facial Expression Recognition in the Wild. IEEE Transactions on Affective Computing, Vol. 14, 2 (2021), 1148--1158. https://doi.org/10.1109/TAFFC.2021.3077248Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nathan Hilliard, Lawrence Phillips, Scott Howland, Artëm Yankov, Courtney D Corley, and Nathan O Hodas. 2018. Few-shot learning with metric-agnostic conditional embeddings. arXiv preprint arXiv:1802.04376 (2018).Google ScholarGoogle Scholar
  20. Sepp Hochreiter, A Steven Younger, and Peter R Conwell. 2001. Learning to Learn Using Gradient Descent. In Artificial Neural Networks - ICANN 2001 (Vienna, Austria). Springer, 87--94.Google ScholarGoogle Scholar
  21. Zhaocheng Huang, Ting Dang, Nicholas Cummins, Brian Stasak, Phu Le, Vidhyasaharan Sethu, and Julien Epps. 2015. An Investigation of Annotation Delay Compensation and Output-Associative Fusion for Multimodal Continuous Emotion Prediction. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge (Brisbane, Australia) (AVEC '15). Association for Computing Machinery, New York, NY, USA, 41--48. https://doi.org/10.1145/2808196.2811640Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Youngkyoon Jang, Hatice Gunes, and Ioannis Patras. 2019. Registration-free Face-SSD: Single shot analysis of smiles, facial attributes, and affect in the wild. Computer Vision and Image Understanding, Vol. 182 (2019), 17--29. https://doi.org/10.1016/j.cviu.2019.01.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Euiseok Jeong, Geesung Oh, and Sejoon Lim. 2022. Multi-Task Learning for Human Affect Prediction With Auditory-Visual Synchronized Representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2438--2445.Google ScholarGoogle ScholarCross RefCross Ref
  24. Longlong Jing and Yingli Tian. 2020. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 43, 11 (2020), 4037--4058. https://doi.org/10.1109/TPAMI.2020.2992393Google ScholarGoogle ScholarCross RefCross Ref
  25. Daeha Kim and Byung Cheol Song. 2022. Emotion-Aware Multi-View Contrastive Learning for Facial Emotion Recognition. In Computer Vision - ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XIII (Tel Aviv, Israel). Springer-Verlag, Berlin, Heidelberg, 178--195. https://doi.org/10.1007/978-3-031-19778-9_11Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  27. Dimitrios Kollias, Shiyang Cheng, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. Deep Neural Network Augmentation: Generating Faces for Affect Analysis. International Journal of Computer Vision, Vol. 128 (Feb 2020), 1455--1484. https://doi.org/10.1007/s11263-020-01304--3Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Dimitrios Kollias, Panagiotis Tzirakis, Mihalis A Nicolaou, Athanasios Papaioannou, Guoying Zhao, Björn Schuller, Irene Kotsia, and Stefanos Zafeiriou. 2019. Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond. International Journal of Computer Vision, Vol. 127, 6-7 (2019), 907--929. https://doi.org/10.1007/s11263-019-01158-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jean Kossaifi, Antoine Toisoul, Adrian Bulat, Yannis Panagakis, Timothy M. Hospedales, and Maja Pantic. 2020. Factorized Higher-Order CNNs With an Application to Spatio-Temporal Emotion Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6059--6068. https://doi.org/10.1109/CVPR42600.2020.00610Google ScholarGoogle ScholarCross RefCross Ref
  30. Jean Kossaifi, Georgios Tzimiropoulos, Sinisa Todorovic, and Maja Pantic. 2017. AFEW-VA database for valence and arousal estimation in-the-wild. Image and Vision Computing, Vol. 65 (2017), 23--36.Google ScholarGoogle ScholarCross RefCross Ref
  31. Shan Li and Weihong Deng. 2022. Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing, Vol. 13, 3 (2022), 1195--1215. https://doi.org/10.1109/TAFFC.2020.2981446Google ScholarGoogle ScholarCross RefCross Ref
  32. Zheng Lian, Ya Li, Jianhua Tao, and Jian Huang. 2018. Speech emotion recognition via contrastive loss under siamese networks. In Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and First Multi-Modal Affective Computing of Large-Scale Multimedia Data (Seoul, Republic of Korea). 21--26. https://doi.org/10.1145/3267935.3267946Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Daizong Liu, Xi Ouyang, Shuangjie Xu, Pan Zhou, Kun He, and Shiping Wen. 2020. SAANet: Siamese action-units attention network for improving dynamic facial expression recognition. Neurocomputing, Vol. 413 (Nov 2020), 145--157.Google ScholarGoogle Scholar
  34. Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-Margin Softmax Loss for Convolutional Neural Networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (New York, NY, USA) (ICML'16). JMLR.org, 507--516.Google ScholarGoogle Scholar
  35. Joseph A Mikels, Barbara L Fredrickson, Gregory R Larkin, Casey M Lindberg, Sam J Maglio, and Patricia A Reuter-Lorenz. 2005. Emotional category data on images from the International Affective Picture System. Behavior Research Methods, Vol. 37, 4 (2005), 626--630.Google ScholarGoogle ScholarCross RefCross Ref
  36. Anna Mitenkova, Jean Kossaifi, Yannis Panagakis, and Maja Pantic. 2019. Valence and Arousal Estimation In-The-Wild with Tensor Methods. In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). 1--7. https://doi.org/10.1109/FG.2019.8756619Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor. 2019. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Transactions on Affective Computing, Vol. 10, 1 (2019), 18--31. https://doi.org/10.1109/TAFFC.2017.2740923Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Soujanya Narayana, Ibrahim Radwan, Ravikiran Parameshwara, Iman Abbasnejad, Akshay Asthana, Ramanathan Subramanian, and Roland Goecke. 2023. A Weakly Supervised Approach to Emotion-change Prediction and Improved Mood Inference. arXiv preprint arXiv:2306.06979 (2023).Google ScholarGoogle Scholar
  39. Soujanya Narayana, Ramanathan Subramanian, Ibrahim Radwan, and Roland Goecke. 2022. To Improve Is to Change: Towards Improving Mood Prediction by Learning Changes in Emotion. In Companion Publication of the 2022 International Conference on Multimodal Interaction (Bengaluru, India) (ICMI '22 Companion). Association for Computing Machinery, New York, NY, USA, 36--41. https://doi.org/10.1145/3536220.3563685Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Pankaj Pandey, Gulshan Sharma, Krishna. P. Miyapuram, Ramanathan Subramanian, and Derek Lomas. 2022. Music Identification Using Brain Responses to Initial Snippets. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1246--1250. https://doi.org/10.1109/ICASSP43922.2022.9747332Google ScholarGoogle ScholarCross RefCross Ref
  41. Ravikiran Parameshwara, Ibrahim Radwan, Ramanathan Subramanian, and Roland Goecke. 2023. Examining Subject-Dependent and Subject-Independent Human Affect Inference from Limited Video Data. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 1--6. https://doi.org/10.1109/FG57933.2023.10042798Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems (Vancouver, Canada), H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 8026--8037.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Tomas Pfister, James Charles, and Andrew Zisserman. 2014. Domain-adaptive discriminative one-shot learning of gestures. In Computer Vision-ECCV 2014: 13th European Conference, Part VI 13 (Zurich, Switzerland) (Lecture Notes in Computer Science, Vol. 8694). Springer, 814--829. https://doi.org/10.1007/978-3-319-10599-4_52Google ScholarGoogle ScholarCross RefCross Ref
  44. Fan Qi, Xiaoshan Yang, and Changsheng Xu. 2021. Emotion Knowledge Driven Video Highlight Detection. IEEE Transactions on Multimedia, Vol. 23 (2021), 3999--4013. https://doi.org/10.1109/TMM.2020.3035285Google ScholarGoogle ScholarCross RefCross Ref
  45. Anoop Kolar Rajagopal, Subramanian Ramanathan, Elisa Ricci, Radu L. Vieriu, Oswald Lanz, Kalpathi Ramakrishnan, and Nicu Sebe. 2014. Exploring Transfer Learning Approaches for Head Pose Classification from Multi-view Surveillance Images. International Journal of Computuer Vision, Vol. 109, 1--2 (2014), 146--167.Google ScholarGoogle Scholar
  46. Shuvendu Roy and Ali Etemad. 2021. Spatiotemporal Contrastive Learning of Facial Expressions in Videos. In 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1--8. https://doi.org/10.1109/ACII52823.2021.9597460Google ScholarGoogle ScholarCross RefCross Ref
  47. Xinke Shen, Xianggen Liu, Xin Hu, Dan Zhang, and Sen Song. 2022. Contrastive Learning of Subject-Invariant EEG Representations for Cross-Subject Emotion Recognition. IEEE Transactions on Affective Computing (2022). https://doi.org/10.1109/TAFFC.2022.3164516Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Abhinav Shukla, Shruti Shriya Gullapuram, Harish Katti, Karthik Yadati, Mohan Kankanhalli, and Ramanathan Subramanian. 2017. Affect Recognition in Ads with Application to Computational Advertising. In ACM Int'l Conference on Multimedia. 1148--1156. https://doi.org/10.1145/3123266.3123444Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Tengfei Song, Wenming Zheng, Peng Song, and Zhen Cui. 2020. EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks. IEEE Transactions on Affective Computing, Vol. 11, 3 (2020), 532--541. https://doi.org/10.1109/TAFFC.2018.2817622Google ScholarGoogle ScholarCross RefCross Ref
  50. Xuran Sun, Jiabei Zeng, and Shiguang Shan. 2021. Emotion-aware Contrastive Learning for Facial Action Unit Detection. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). IEEE, 01--08. https://doi.org/10.1109/FG52635.2021.9666945Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Mani Kumar Tellamekala and Michel Valstar. 2019. Temporally Coherent Visual Representations for Dimensional Affect Recognition. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1--7. https://doi.org/10.1109/ACII.2019.8925529Google ScholarGoogle ScholarCross RefCross Ref
  52. Antoine Toisoul, Jean Kossaifi, Adrian Bulat, Georgios Tzimiropoulos, and Maja Pantic. 2021. Estimation of continuous valence and arousal levels from faces in naturalistic conditions. Nature Machine Intelligence, Vol. 3, 1 (2021), 42--50. https://doi.org/10.1038/s42256-020-00280-0Google ScholarGoogle ScholarCross RefCross Ref
  53. Ivan Y. Tyukin, Alexander N. Gorban, Muhammad H. Alkhudaydi, and Qinghua Zhou. 2021. Demystification of Few-shot and One-shot Learning. In 2021 International Joint Conference on Neural Networks (IJCNN). 1--7. https://doi.org/10.1109/IJCNN52387.2021.9534395Google ScholarGoogle ScholarCross RefCross Ref
  54. Shu-Hui Wang and Chiou-Ting Hsu. 2017. AST-Net: An Attribute-based Siamese Temporal Network for Real-Time Emotion Recognition. In British Machine Vision Conference 2017 (BMVC 2017) (London, UK).Google ScholarGoogle Scholar
  55. Yaqing Wang, Quanming Yao, James T. Kwok, and Lionel M. Ni. 2020. Generalizing from a Few Examples: A Survey on Few-Shot Learning. Comput. Surveys, Vol. 53, 3, Article 63 (Jun 2020). https://doi.org/10.1145/3386252Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5177--5186. https://doi.org/10.1109/CVPR.2018.00543Google ScholarGoogle ScholarCross RefCross Ref
  57. Rui Xia and Yang Liu. 2015. A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space. IEEE Transactions on Affective Computing, Vol. 8, 1 (2015), 3--14. https://doi.org/10.1109/TAFFC.2015.2512598Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Tianyi Zhang, Abdallah El Ali, Alan Hanjalic, and Pablo Cesar. 2022. Few-shot Learning for Fine-grained Emotion Recognition using Physiological Signals. IEEE Transactions on Multimedia (2022). https://doi.org/10.1109/TMM.2022.3165715Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Tenggan Zhang, Chuanhe Liu, Xiaolong Liu, Yuchen Liu, Liyu Meng, Lei Sun, Wenqiang Jiang, Fengyuan Zhang, Jinming Zhao, and Qin Jin. 2023. Multi-Task Learning Framework for Emotion Recognition In-the-Wild. In Computer Vision - ECCV 2022 Workshops. Springer, 143--156. https://doi.org/10.1007/978-3-031-25075-0_11Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Zhilu Zhang and Mert Sabuncu. 2018. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. In Advances in Neural Information Processing Systems (Montréal, Canada) (NIPS'18, Vol. 31), S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 8792--8802. https://proceedings.neurips.cc/paper_files/paper/2018/file/f2925f97bc13ad2852a7a551802feea0-Paper.pdfGoogle ScholarGoogle Scholar
  61. Xinyi Zou, Yan Yan, Jing-Hao Xue, Si Chen, and Hanzi Wang. 2022. When facial expression recognition meets few-shot learning: a joint and alternate learning framework. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22), Vol. 36. 5367--5375.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient Labelling of Affective Video Datasets via Few-Shot & Multi-Task Contrastive Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '23: Proceedings of the 31st ACM International Conference on Multimedia
        October 2023
        9913 pages
        ISBN:9798400701085
        DOI:10.1145/3581783

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 October 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia
      • Article Metrics

        • Downloads (Last 12 months)95
        • Downloads (Last 6 weeks)22

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader