skip to main content
10.1145/3474085.3475295acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition

Authors Info & Claims
Published:17 October 2021Publication History

ABSTRACT

Facial action unit (AU) recognition has attracted increasing attention due to its indispensable role in affective computing, especially in the field of affective human-computer interaction. Due to the subtle and transient nature of AU, it is challenging to capture the delicate and ambiguous motions in local facial regions among consecutive frames. Considering that context is essential to resolve ambiguity in human visual system, modeling context within or among facial images emerges as a promising approach for AU recognition task. To this end, we propose CaFGraph, a novel context-aware facial multi-graph that can model both morphological & muscular-based region-level local context and region-level temporal context. CaFGraph is the first work to construct a universal facial multi-graph structure that is independent of both task settings and dataset statistics for almost all fine-grained facial behavior analysis tasks, including but not limited to AU recognition. To make full use of the context, we then present CaFNet that learns context-aware facial graph representations via CaFGraph from facial images for multi-label AU recognition. Experiments on two widely used benchmark datasets, BP4D and DISFA, demonstrate the superiority of our CaFNet over the state-of-the-art methods.

Skip Supplemental Material Section

Supplemental Material

MM21-fp709.mp4

mp4

355.9 MB

References

  1. Irving Biederman, Robert J Mezzanotte, and Jan C Rabinowitz. 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive psychology, Vol. 14, 2 (1982), 143--177.Google ScholarGoogle Scholar
  2. Mina Bishay and Ioannis Patras. 2017. Fusing multilabel deep networks for facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 681--688.Google ScholarGoogle ScholarCross RefCross Ref
  3. Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, and Jianming Zheng. 2020. GeoConv: Geodesic guided convolution for facial action unit recognition. arXiv preprint arXiv:2003.03055 (2020).Google ScholarGoogle Scholar
  4. Yingjie Chen, Tao Wang, Han Wu, and Yizhou Wang. 2018a. A fast and accurate multi-model facial expression recognition method for affective intelligent robots. In 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR). IEEE, 319--324.Google ScholarGoogle ScholarCross RefCross Ref
  5. Yingjie Chen, Han Wu, Tao Wang, and Yizhou Wang. 2018b. A Comparison of Methods of Facial Expression Recognition. In 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA). IEEE, 261--268.Google ScholarGoogle ScholarCross RefCross Ref
  6. Wen-Sheng Chu, Fernando De la Torre, and Jeffery F Cohn. 2013. Selective transfer machine for personalized facial action unit detection. 3515--3522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ciprian Corneanu, Meysam Madadi, and Sergio Escalera. 2018. Deep structure inference network for facial action unit recognition. 298--313.Google ScholarGoogle Scholar
  8. P. Ekman and W. Friesen. 1978. Facial action coding system: A technique for the measurement of facial movement .Google ScholarGoogle Scholar
  9. S. Eleftheriadis, O. Rudovic, and M. Pantic. 2015. Multi-conditional latent variable model for joint facial action unit detection. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shizhong Han, Zibo Meng, Ahmed-Shehab Khan, and Yan Tong. 2016. Incremental boosting convolutional neural network for facial action unit recognition. 109--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Shizhong Han, Zibo Meng, Zhiyuan Li, James O'Reilly, Jie Cai, Xiaofeng Wang, and Yan Tong. 2018. Optimizing filter size in convolutional neural networks for facial action unit recognition.Google ScholarGoogle Scholar
  12. Jun He, Dongliang Li, Bin Yang, Siming Cao, Bo Sun, and Lejun Yu. 2017. Multi view facial action unit detection based on CNN and BLSTM-RNN.Google ScholarGoogle Scholar
  13. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition.Google ScholarGoogle Scholar
  14. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. PMLR, 448--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bihan Jiang, Michel F Valstar, and Maja Pantic. 2011. Action unit detection using sparse appearance descriptors in space-time video volumes. IEEE, 314--321.Google ScholarGoogle Scholar
  16. Davis E King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, Vol. 10, Jul (2009), 1755--1758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google ScholarGoogle Scholar
  18. Irene Kotsia, Stefanos Zafeiriou, and Ioannis Pitas. 2008. Texture and shape information fusion for facial expression and facial action unit recognition. Pattern Recognition, Vol. 41, 3 (2008), 833--851. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, and Liang Lin. 2019. Semantic relationships guided representation learning for facial action unit recognition, Vol. 33. 8594--8601.Google ScholarGoogle Scholar
  20. Wei Li, Farnaz Abtahi, and Zhigang Zhu. 2017a. Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing.Google ScholarGoogle Scholar
  21. Wei Li, Farnaz Abtahi, Zhigang Zhu, and Lijun Yin. 2017b. EAC-Net: A region-based deep enhancing and cropping approach for facial action unit detection.Google ScholarGoogle Scholar
  22. Zhilei Liu, Jiahui Dong, Cuicui Zhang, Longbiao Wang, and Jianwu Dang. 2020. Relation modeling with graph convolutional networks for facial action unit detection. In International Conference on Multimedia Modeling. Springer, 489--501.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S Mohammad Mavadati, Mohammad H Mahoor, Kevin Bartlett, Philip Trinh, and Jeffrey F Cohn. 2013. DISFA: A spontaneous facial action intensity database. IEEE Transactions on Affective Computing (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xuesong Niu, Hu Han, Songfan Yang, Yan Huang, and Shiguang Shan. 2019. Local relationship learning with person-specific shape regularization for facial action unit detection. 11917--11926.Google ScholarGoogle Scholar
  25. Itir Onal Ertugrul, Le Yang, László A Jeni, and Jeffrey F Cohn. 2019. D-PAttNet: Dynamic patch-attentive deep network for action unit detection. Frontiers in computer science, Vol. 1 (2019), 11.Google ScholarGoogle Scholar
  26. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS Workshop .Google ScholarGoogle Scholar
  27. Yanting Pei, Yaping Huang, Qi Zou, Xingyuan Zhang, and Song Wang. 2019. Effects of image degradation and degradation removal to cnn-based image classification. IEEE transactions on pattern analysis and machine intelligence (2019).Google ScholarGoogle ScholarCross RefCross Ref
  28. Delian Ruan, Yan Yan, Si Chen, Jing-Hao Xue, and Hanzi Wang. 2020. Deep Disturbance-Disentangled Learning for Facial Expression Recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2833--2841. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Nishant Sankaran, Deen Dayal Mohan, Srirangaraj Setlur, Venugopal Govindaraju, and Dennis Fedorishin. 2019. Representation learning through cross-modality supervision. IEEE, 1--8.Google ScholarGoogle Scholar
  30. Zhiwen Shao, Zhilei Liu, Jianfei Cai, and Lizhuang Ma. 2018. Deep adaptive attention for joint facial action unit detection and face alignment.Google ScholarGoogle Scholar
  31. Zhiwen Shao, Lixin Zou, Jianfei Cai, Yunsheng Wu, and Lizhuang Ma. 2020. Spatio-Temporal Relation and Attention Learning for Facial Action Unit Detection. arXiv preprint arXiv:2001.01168 (2020).Google ScholarGoogle Scholar
  32. Michel Valstar and Maja Pantic. 2006. Fully automatic facial action unit detection and temporal analysis. In Conference on Computer Vision and Pattern Recognition Workshop. IEEE, 149--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Can Wang and Shangfei Wang. 2018. Personalized multiple facial action unit recognition through generative adversarial recognition network. In Proceedings of the 26th ACM international conference on Multimedia. 302--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, and Fengbo Ren. 2020. Learning in the Frequency Domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1740--1749.Google ScholarGoogle ScholarCross RefCross Ref
  35. Jingwei Yan, Boyuan Jiang, Jingjing Wang, Qiang Li, Chunmao Wang, and Shiliang Pu. 2021. Multi-Level Adaptive Region of Interest and Graph Learning for Facial Action Unit Recognition. arXiv preprint arXiv:2102.12154 (2021).Google ScholarGoogle Scholar
  36. Huiyuan Yang, Taoyue Wang, and Lijun Yin. 2020. Adaptive Multimodal Fusion for Facial Action Units Recognition. In Proceedings of the 28th ACM International Conference on Multimedia. 2982--2990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Uldis Zarins. 2018. Anatomy of Facial Expressions .Exonicus, Incorporated.Google ScholarGoogle Scholar
  38. X. Zhang and M. H. Mahoor. 2014. Simultaneous detection of multiple facial action units via hierarchical task structure learning. In Proceedings of International Conference on Pattern Recognition . Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xing Zhang, Lijun Yin, Jeffrey F Cohn, Shaun Canavan, Michael Reale, Andy Horowitz, Peng Liu, and Jeffrey M Girard. 2014. BP4D-spontaneous: A high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, Vol. 32, 10 (2014), 692--706.Google ScholarGoogle ScholarCross RefCross Ref
  40. Kaili Zhao, Wen-Sheng Chu, and Honggang Zhang. 2016. Deep region and multi-label learning for facial action unit detection.Google ScholarGoogle Scholar
  41. Kaili Zhao, Wen-Sheng Chu, Fernando De la Torre, Jeffrey F Cohn, and Honggang Zhang. 2015. Joint patch and multi-label learning for facial action unit detection. 2207--2216.Google ScholarGoogle Scholar
  42. Linyi Zhou, Xijian Fan, Yingjie Ma, Tardi Tjahjadi, and Qiaolin Ye. 2020. Uncertainty-aware Cross-dataset Facial Expression Recognition via Regularized Conditional Alignment. In Proceedings of the 28th ACM International Conference on Multimedia. 2964--2972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Junjie Zhu, Bingjun Luo, Sicheng Zhao, Shihui Ying, Xibin Zhao, and Yue Gao. 2020. IExpressNet: Facial Expression Recognition with Incremental Classes. In Proceedings of the 28th ACM International Conference on Multimedia. 2899--2908. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '21: Proceedings of the 29th ACM International Conference on Multimedia
      October 2021
      5796 pages
      ISBN:9781450386517
      DOI:10.1145/3474085

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader