Multi-view facial action unit detection via DenseNets and CapsNets

Ren, Dakai; Wen, Xiangmin; Chen, Jiazhong; Han, Yu; Zhang, Shiqi

doi:10.1007/s11042-021-11147-w

Multi-view facial action unit detection via DenseNets and CapsNets

1182: Deep Processing of Multimedia Data
Published: 18 June 2021

Volume 81, pages 19377–19394, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Dakai Ren¹,
Xiangmin Wen¹,
Jiazhong Chen²,
Yu Han² &
…
Shiqi Zhang²

430 Accesses
1 Altmetric
Explore all metrics

Abstract

Though the standard convolutional neural networks (CNNs) have been proposed to increase the robustness of facial action unit (AU) detection regarding pose variations, it is hard to enhance detection performance because the standard CNNs are not robust enough to affine transformation. To address this issue, two novel architectures termed as AUCaps and AUCaps++ are proposed for multi-view and multi-label facial AU detection in this work. In these two architectures, one or more dense blocks and one capsule networks (CapsNets) are stacked. Specifically, The dense blocks prefixed before CapsNets are used to learn more discriminative high-level AU features, and the CapsNets is exploited to learn more view-invariant AU features. Moreover, the capsule types and digit capsule dimension are optimized to avoid the computation and storage burden caused by the dynamic routing in standard CapsNets. Because the AUCaps and AUCaps++ are trained by jointly optimizing multi-label loss of AU and reconstruction loss of viewpoint image, the proposed method could achieve high F1 score and learn human face roughly in the reconstruction images over different AUs. Numerical results of within-dataset and cross-dataset show that the average F1 scores of the proposed method outperform the competitors using hand-crafted features or deep learning features by a big margin on two public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 5

JÂA-Net: Joint Facial Action Unit Detection and Face Alignment Via Adaptive Attention

Article 10 September 2020

Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment

Lightweight network architecture using difference saliency maps for facial action unit detection

Article 08 September 2021

References

Afshar P, Mohammadi A, Plataniotis K (2018) Brain tumor type classification via capsule networks. In: International conference on image processing. IEEE, pp 3129–3133
Almaev T, Valstar M (2013) Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition. In: Humaine association conference on affective computing and intelligent interaction. IEEE, pp 356–361
Baltrusaitis T, Mahmoud M, Robinson P (2015) Cross-dataset learning and person-specific normalisation for automatic action unit detection. In: International conference on automatic face and gesture. IEEE, pp 1–6
Batista J, Albiero V, Bellon O, Silva L (2017) AUMPNEt: Simultaneous action units detection and intensity estimation on multipose facial images using a single convolutional neural network. In: International conference on automatic face and gesture recognition. IEEE, pp 866–871
Chu W, Torre F, Cohn J (2017) Learning spatial and temporal cues for multi-label facial action unit detection. In: International conference on automatic face and gesture. IEEE, pp 25–32
Cootes T, Edwards G, Taylor C (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685
Article Google Scholar
Ekman P, Rosenberg E (1997) What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA
Google Scholar
Eleftheriadis S, Rudovic O, Pantic M (2015) Multi-conditional latent variable model for joint facial action unit detection. In: International conference on computer vision. IEEE, pp. 3792–3800
Ertugrul I, Jeni L, Cohn J (2018) FACSCAps: Pose-independent facial action coding with capsules. In: Conference on computer vision and pattern recognition. IEEE, pp. 2130–2139
Fabian C, Srinivasan R, Martinez A (2016) Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: Conference on computer vision and pattern recognition. IEEE, pp 5562–5570
Gudi A, Tasli H, Uyl TD, Maroulis A (2015) Deep learning based facs action unit occurrence and intensity estimation. In: International conference and workshops on automatic face and gesture recognition. IEEE, pp 1–5
He J, Li D, Yang B, Cao S, Sun B, Yu L (2017) Multi view facial action unit detection based on CNN and BLSTM-RNN. In: International conference on automatic face and gesture. IEEE, pp 848–853
Hinton G, Sabour S, Frosst N (2018) Matrix capsules with em routing. In: International conference on learning representations, pp 1–15
Huang G, Liu Z, Maaten L, Weinberger K (2017) Densely connected convolutional networks. In: Conference on computer vision and pattern recognition. IEEE, pp 2261–2269
Jaiswal A, Abdalmageed W, Natarajan P (2018) Capsulegan: Generative adversarial capsule network
Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: International conference on computer vision. IEEE, pp 2983–2991
Kingma D, Adam JB (2015) A method for stochastic optimization. In: ICLR
Lakshminarayana N, Setlur S, Govindaraju V (2020) Learning guided attention masks for facial action unit recognition. In: International conference on automatic face and gesture recognition. IEEE, pp 465–472
Li X, Chen S, Jin Q (2017) Facial action units detection with multi-features and -AUs fusion. In: International conference on automatic face and gesture. IEEE, pp 860–865
Li Y, Chen J, Zhao Y, Ji Q (2013) Data-free prior model for facial action unit recognition. Trans Affect Comput 4(2):127–141
Article Google Scholar
Liang L, Lang C, Li Y, Feng S, Zhao J (2021) Fine-grained facial expression recognition in the wild. IEEE Trans Inform Forens Secur 16:482–494
Article Google Scholar
Lucey P, Cohn J, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: Conference on computer vision and pattern recognition workshops. IEEE, pp 94–101
Mehrabian A, Russell J (1974) An approach to environmental psychology. MIT, Cambridge
Google Scholar
Pantic M, Valstar M, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: International conference on multimedia and expo. IEEE, pp 317–321
Peng G, Wang S (2018) Weakly supervised facial action unit recognition through adversarial training. In: Conference on computer vision and pattern recognition. IEEE, pp 2188–2196
Phaye S, Sikka A, Dhall A, Bathula D (2018) Dense and diverse capsule networks: Making the capsules learn better. arXiv:1805.04001
Ruiz A, Weijer J, Binefa X (2015) From emotions to action units with hidden and semi-hidden-task learning. In: International conference on computer vision. IEEE, pp 3703–3711
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Adv Neural Inform Process Syst 3859–3869
Sanchez-Lozano E, Martinez B, Tzimiropoulos G, Valstar M (2016) Cascaded continuous regression for real-time incremental face tracking. In: European conference on computer vision. Springer, pp 645–661
Sander K, Maja P, Ioannis P (2010) A dynamic texture-based approach to recognition of facial actions and their temporal models. IEEE Trans Pattern Anal Machine Intell 32(11):1940–1954
Article Google Scholar
Senechal T, Rapp V, Salam H, Seguier R, Bailly K, Prevost L (2012) Facial action recognition combining heterogeneous features via multikernel learning. IEEE Trans Syst Man Cybern Part B 42(4):993–1005
Article Google Scholar
Shao Z, Liu Z, Cai J, Ma L (2017) Deep adaptive attention for joint facial action unit detection and face alignment. In: European conference on computer vision. Springer, pp 725–740
Tang C, Zheng W, Yan J, Li Q, Li Y, Zhang T, Cui Z (2017) View-independent facial action unit detection. In: International conference on automatic face and gesture. IEEE, pp 878–882
Toser Z, Jeni L, Lorincz A, Cohn J (2016) Deep learning for facial action unit detection under large head poses. In: European conference on computer vision. Springer, pp 359–371
Valstar M, Sánchezlozano E., Cohn J, Jeni L, Girard J, Zhang Z, Yin L, Pantic M (2017) Fera 2017 - addressing head pose in the third facial expression recognition and analysis challenge. In: International conference on automatic face and gesture recognition. IEEE, pp 839–847
Valstar MF, Mehu M, Jiang B, Pantic M, Scherer K (2012) Meta-analyis of the first facial expression recognition challenge. IEEE Trans Syst Man Cybern Part B 42(4):966–979
Article Google Scholar
Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Suppressing uncertainties for large-scale facial expression recognition. In: Conference on computer vision and pattern recognition. IEEE, pp 6896–6905
Wang Y, Sun A, Han J, Liu Y, Zhu X (2018) Sentiment analysis by capsules. In: Proceedings of the 2018 World Wide Web Conference, pp 1165–1174
Wang Z, Li Y, Wang S, Ji Q (2014) Capturing global semantic relationships for facial action unit recognition. In: International conference on computer vision. IEEE, pp 3304–3311
Wu S, Wang S, Pan B, Ji Q (2017) Deep facial action unit recognition from partially labeled data. In: International conference on computer vision. IEEE, pp 3971–3979
Xi E, Bing S, Jin Y (2017) Capsule network performance on complex data. arXiv:1712.03480
Yang H, Liu L, Min W, Yang X, Xiong X (2021) Driver yawning detection based on subtle facial action recognition. IEEE Trans Multimed 23:572–583
Article Google Scholar
Yuce A, Gao H, Thiran J (2015) Discriminant multi-label manifold embedding for facial action unit detection. In: International conference on automatic face and gesture. IEEE, pp 1–6
Zhang F, Zhang T, Mao Q, Xu C (2018) Joint pose and expression modeling for facial expression recognition. In: Conference on computer vision and pattern recognition. IEEE, pp 3359–3368
Zhang X, Yin L, Cohn J, Canavan S, Reale M, Horowitz A, Liu P, Girard J (2013) BP4D-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image Vis Comput 32(10):692–706
Article Google Scholar
Zhang Y, Jiang H, Wu B, Fan Y, Ji Q (2019) Context-aware feature and label fusion for facial action unit intensity estimation with partially labeled data. In: International conference on computer vision. IEEE, pp 733–742
Zhang Y, Wu B, Dong W, Li Z, Liu W (2019) B., Ji, Q.: Joint representation and estimator learning for facial action unit intensity estimation. In: Conference on computer vision and pattern recognition. IEEE, pp 3457–3466
Zhang Y, Zhao R, Dong W, Hu B, Ji Q (2018) Bilateral ordinal relevance multi-instance regression for facial action unit intensity estimation. In: Conference on computer vision and pattern recognition. IEEE, pp 7034–7043
Zhang Z, Girard J, Wu Y, Zhang X, Ciftci PLU, Canavan S, Reale M, Horowitz A, Yang H (2016) Multimodal spontaneous emotion corpus for human behavior analysis. In: Conference on computer vision and pattern recognition. IEEE, pp 3438–3446
Zhao K, Chu W, Torre F, Cohn J, Zhang H (2015) Joint patch and multi-label learning for facial action unit detection. In: Conference on computer vision and pattern recognition. IEEE, pp 2207–2216
Zhao K, Chu W, Zhang H (2016) Deep region and multi-label learning for facial action unit detection. In: Computer vision and pattern recognition. IEEE, pp 3391–3399

Download references

Acknowledgment

This work was supported in part by the Beijing Nova Program (Z201100006820123) from Beijing Municipal Science and Technology Commission, in part by the Natural Science Foundation of China under Grant U1536203 and 61972169, in part by the National key research and development program of China (2016QY01W0200), in part by the Major Scientific and Technological Project of Hubei Province (2018AAA068 and 2019AAA051).

Author information

Authors and Affiliations

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Dakai Ren & Xiangmin Wen
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Jiazhong Chen, Yu Han & Shiqi Zhang

Authors

Dakai Ren
View author publications
You can also search for this author in PubMed Google Scholar
Xiangmin Wen
View author publications
You can also search for this author in PubMed Google Scholar
Jiazhong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yu Han
View author publications
You can also search for this author in PubMed Google Scholar
Shiqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiazhong Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ren, D., Wen, X., Chen, J. et al. Multi-view facial action unit detection via DenseNets and CapsNets. Multimed Tools Appl 81, 19377–19394 (2022). https://doi.org/10.1007/s11042-021-11147-w

Download citation

Received: 23 April 2020
Revised: 22 April 2021
Accepted: 03 June 2021
Published: 18 June 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11042-021-11147-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view facial action unit detection via DenseNets and CapsNets

Abstract

Access this article

Similar content being viewed by others

JÂA-Net: Joint Facial Action Unit Detection and Face Alignment Via Adaptive Attention

Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment

Lightweight network architecture using difference saliency maps for facial action unit detection

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-view facial action unit detection via DenseNets and CapsNets

Abstract

Access this article

Similar content being viewed by others

JÂA-Net: Joint Facial Action Unit Detection and Face Alignment Via Adaptive Attention

Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment

Lightweight network architecture using difference saliency maps for facial action unit detection

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation