Stacking multiple cues for facial action unit detection

Akay, Simge; Arica, Nafiz

doi:10.1007/s00371-021-02291-3

Stacking multiple cues for facial action unit detection

Original article
Published: 21 September 2021

Volume 38, pages 4235–4250, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

417 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

In this study, we develop a deep learning-based stacking scheme to detect facial action units (AU) in video data. Given a sequence of video frames, it combines multiple cues extracted from the AU detectors employing in frame, segment, and transition levels. Frame-based detector takes a single frame to determine the existence of AU by employing static face features. Segment-based detector examines various length of subsequences in the neighborhood of a frame to detect whether that frame is an element of an AU segment. Transition-based detector attempts to find the transitions from neutral faces containing no AUs to emotional faces or vice versa, by analyzing fixed size subsequences. The frame subsequences in segment and transition detectors are represented by motion history image, which models the temporal changes in faces. Each detector employs a separate convolutional neural network and, then their results are fed into a meta-classifier to learn the combining method. Combining multiple cues in different levels with a framework containing entirely deep networks improves the detection performance by both locating subtle AUs and tracking small changes in the facial muscles’ movements. In performance analysis, it is shown that the proposed approach significantly outperforms the state of the art methods, when compared on CK+, DISFA, and BP4D databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-frame feature-fusion-based model for violence detection

Article 24 June 2020

Deepfake Video Detection and Classification Through Dynamic Spatio- Temporal Inconsistency Analysis

One-Shot Only Real-Time Video Classification: A Case Study in Facial Emotion Recognition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://github.com/st20210104/Stacking-Multiple-Cues-For-Facial-Action-Unit-Detection.git.

References

Ekmann ve, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Personality Soc. Psychol. 17(2), 124–129 (1971)
Article Google Scholar
Pantic, M., Patras, I.: Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 36(2), 433–449 (2006)
Article Google Scholar
Ding, X., Chu, W. S., De la Torre, F., Cohn, J. F., Wang, Q.: Facial action unit event detection by cascade of tasks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2400–2407 (2013)
Broekens, J.: Emotion and reinforcement: affective facial expressions facilitate robot learning. In: Artifical Intelligence for Human Computing, pp. 113–132. Springer, Berlin (2007)
Bravo, J. A., Forsythe, P., Chew, M. V., Escaravage, E., Savignac, H. M., Dinan, T. G., Cryan, J. F.: Ingestion of Lactobacillus strain regulates emotional behavior and central GABA receptor expression in a mouse via the vagus nerve. In: Proceedings of the National Academy of Sciences, 201102999 (2011)
Zhang, X., Yin, L., Cohn, J.F., Canavan, S., Reale, M., Horowitz, A., Girard, J.M.: Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image Vis. Comput. 32(10), 692–706 (2014)
Article Google Scholar
Duan, H., Shao, X., Hou, W., He, G., Zeng, Q.: An incremental learning algorithm for Lagrangian support vector machines. Pattern Recogn. Lett. 30(15), 1384–1391 (2009)
Article Google Scholar
Jiang, B., Valstar, M. F., Pantic, M.: Action unit detection using sparse appearance descriptors in space- time video volumes. In: 2011 IEEE International Conference on Automatic Face Gesture Recognition and Workshops (FG 2011), pp. 314–321. IEEE (2011, March)
Tang, C., Zheng, W., Yan, J., Li, Q., Li, Y., Zhang, T., Cui, Z.: View-independent facial action unit detec- tion. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 878–882. IEEE (2017, May)
Zhao, K., Chu, W. S., De la Torre, F., Cohn, J. F., Zhang, H.: Joint patch and multi-label learning for facial action unit detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2207–2216 (2015)
Zhao, K., Chu, W. S., Zhang, H.: Deep region and multi-label learning for facial action unit detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3391–3399 (2016)
Taigman, Y., Yang, M., Ranzato, M. A., Wolf, L.: Deepface: Closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)
Romero, A., Leon, J., Arbelaez, P.: Multi-View Dynamic Facial Action Unit Detection, Image and Vision Computing (2018)
Shao, Z., Liu, Z., Cai, J., Wu, Y., Ma, L.: Facial action unit detection using attention and relation learning. IEEE Transactions on Affective Computing (2019)
Corneanu, C. A., Madadi, M., Escalera, S.: Deep structure inference network for facial action unit recognition. In: European Conference on Computer Vision. Springer, pp. 309–324 (2018)
De la Torre, F., Simon, T., Ambadar, Z., Cohn, J. F.: Fast-FACS: A computer-assisted system to increase speed and reliability of manual FACS coding. In: International Conference on Affective Computing and Intelligent Interaction, pp. 57–66. Springer, Berlin, Heidelberg (2011, October)
Zeng, J., Chu, W.S., De la Torre, F., Cohn, J.F., Xiong, Z.: Confidence preserving machine for facial action unit detection. In: IEEE International Conference on Computer Vision, pp. 3622–3630. IEEE (2015)
Rudovic, O., Pavlovic, V., Pantic, M., (2012) Kernel Conditional Ordinal Random Fields for Temporal Segmentation of Facial Action Units. In: Fusiello A., Murino V., Cucchiara R. (eds) Computer Vision - ECCV 2012. Workshops and Demonstrations. ECCV, : Lecture Notes in Computer Science, vol. 7584. Springer, Berlin, Heidelberg (2012)
Shao, Z., Liu, Z., Cai, J., Wu, Y., Ma, L.: Weakly-Supervised Attention and Relation Learningfor Facial Action Unit Detection. IEEE Transactions on Affective Computing (2018)
Jaiswal, S., Valstar, M.: Deep learning the dynamic appearance and shape of facial action units. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016, March)
Li, W., Abtahi, F., Zhu, Z.: Action unit detection with region adaptation, multi-labeling learning and op- timal temporal fusing. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6766–6775. IEEE (2017, July)
Li, W., Abtahi, F., Zhu, Z., Yin, L.: Eac-net: A region-based deep enhancing and cropping approach for facial action unit detection (2017). arXiv preprint arXiv:1702.02925
Valstar, M.F., Pantic, M.: Fully automatic recognition of the temporal phases of facial actions. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 42(1), 28–43 (2012)
Article Google Scholar
Pei, W., Dibekliolu, H., Tax, D.M., van der Maaten, L.: Multivariate time-series classification using the hidden- unit logistic model. IEEE Trans. Neural Netw. Learn. Syst. 29(4), 920–931 (2018)
Article Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 (Vol. 1, pp. I-I). IEEE (2001, December)
Zhang, Z., Zhai, S., Yin, L.: Identity-based Adversarial Training of Deep CNNs for Facial Action Unit Recognition. In: BMVC, p. 226 (2018, September)
Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M. (2014). Incremental face alignment in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition(pp. 1859-1866)
Davis, J. W., Bobick, A. F.: The repre- sentation and recognition of human movement using temporal templates. In: 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1997. Proceedings., pp. 928–934. IEEE (1997, June)
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohnkanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101. IEEE (2010, June)
Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: Disfa: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Zhong, L., Liu, Q., Yang, P., Huang, J., Metaxas, D.N.: Learning multiscale active facial patches for expression analysis. IEEE Trans. Cybern. 45(8), 1499–1510 (2014)
Article Google Scholar
Zhi, R., Liu, M., Zhang, D.: A comprehensive survey on automatic facial action unit analysis. Vis. Comput. 36(5), 1067–1093 (2020)
Martinez, B., Valstar, M. F., Jiang, B., Pantic, M.: Automatic analysis of facial actions: a survey. IEEE transactions on affective computing (2017)
Sumathi, C.P., Santhanam, T., Mahadevi, M.: Automatic facial expression analysis a survey. Int. J. Comput. Sci. Eng. Surv. 3(6), 47 (2012)
Article Google Scholar
Li, G., Zhu, X., Zeng, Y., Wang, Q., Lin, L.: Semantic relationships guided representation learning for facial action unit recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 8594–8601) (2019, July)
Liu, Z., Dong, J., Zhang, C., Wang, L., Dang, J.: Relation modeling with graph convolutional networks for facial action unit detection. In: International Conference on Multimedia Modeling, pp. 489–501. Springer, Cham (2020, January)
Shao, Z., Liu, Z., Cai, J., Ma, L.: Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 705–720 (2018)
Chu, W. S., De la Torre, F., Cohn, J. F.: Modeling spatial and temporal cues for multi-label facial action unit detection (2016). arXiv preprint arXiv:1608.00911
Song, T., Chen, L., Zheng, W., Ji, Q.: Uncertain Graph Neural Networks for Facial Action Unit Detection. (AAAI 2021) (2021)
Cui, Z., Song, T., Wang, Y., Ji, Q.: Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition. Advances in Neural Information Processing Systems, 33. (NeurIPS 2020) (2020)
Huang, Y., Qing, L., Xu, S., Wang, L., Peng, Y.: HybNet: a hybrid network structure for pain intensity estimation. Vis. Comput. 2021, 1–12 (2021)
Google Scholar
Joseph, A., Geetha, P.: Facial emotion detection using modified eyemap-mouthmap algorithm on an enhanced image and classification with tensorflow. Vis. Comput. 36(3), 529–539 (2020)
Article Google Scholar
Vinolin, V., Sucharitha, M.: Dual adaptive deep convolutional neural network for video forgery detection in 3D lighting environment. The Visual Computer, pp. 1–22 (2020)
Zhu, X., Chen, Z.: Dual-modality spatiotemporal feature learning for spontaneous facial expression recognition in e-learning using hybrid deep neural network. Vis. Comput. 2019, 1–13 (2019)
Google Scholar
Danelakis, A., Theoharis, T., Pratikakis, I.: A robust spatio-temporal scheme for dynamic 3D facial expression retrieval. Vis. Comput. 32(2), 257–269 (2016)
Article Google Scholar

Download references

Funding

This work is supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under Grant No. 115E310.

Author information

Authors and Affiliations

Bahcesehir University, Istanbul, Turkey
Simge Akay & Nafiz Arica

Authors

Simge Akay
View author publications
You can also search for this author in PubMed Google Scholar
Nafiz Arica
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simge Akay.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akay, S., Arica, N. Stacking multiple cues for facial action unit detection. Vis Comput 38, 4235–4250 (2022). https://doi.org/10.1007/s00371-021-02291-3

Download citation

Accepted: 17 August 2021
Published: 21 September 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00371-021-02291-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stacking multiple cues for facial action unit detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-frame feature-fusion-based model for violence detection

Deepfake Video Detection and Classification Through Dynamic Spatio- Temporal Inconsistency Analysis

One-Shot Only Real-Time Video Classification: A Case Study in Facial Emotion Recognition

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Stacking multiple cues for facial action unit detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-frame feature-fusion-based model for violence detection

Deepfake Video Detection and Classification Through Dynamic Spatio- Temporal Inconsistency Analysis

One-Shot Only Real-Time Video Classification: A Case Study in Facial Emotion Recognition

Explore related subjects

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation