Abstract
In this study, we develop a deep learning-based stacking scheme to detect facial action units (AU) in video data. Given a sequence of video frames, it combines multiple cues extracted from the AU detectors employing in frame, segment, and transition levels. Frame-based detector takes a single frame to determine the existence of AU by employing static face features. Segment-based detector examines various length of subsequences in the neighborhood of a frame to detect whether that frame is an element of an AU segment. Transition-based detector attempts to find the transitions from neutral faces containing no AUs to emotional faces or vice versa, by analyzing fixed size subsequences. The frame subsequences in segment and transition detectors are represented by motion history image, which models the temporal changes in faces. Each detector employs a separate convolutional neural network and, then their results are fed into a meta-classifier to learn the combining method. Combining multiple cues in different levels with a framework containing entirely deep networks improves the detection performance by both locating subtle AUs and tracking small changes in the facial muscles’ movements. In performance analysis, it is shown that the proposed approach significantly outperforms the state of the art methods, when compared on CK+, DISFA, and BP4D databases.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ekmann ve, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Personality Soc. Psychol. 17(2), 124–129 (1971)
Pantic, M., Patras, I.: Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 36(2), 433–449 (2006)
Ding, X., Chu, W. S., De la Torre, F., Cohn, J. F., Wang, Q.: Facial action unit event detection by cascade of tasks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2400–2407 (2013)
Broekens, J.: Emotion and reinforcement: affective facial expressions facilitate robot learning. In: Artifical Intelligence for Human Computing, pp. 113–132. Springer, Berlin (2007)
Bravo, J. A., Forsythe, P., Chew, M. V., Escaravage, E., Savignac, H. M., Dinan, T. G., Cryan, J. F.: Ingestion of Lactobacillus strain regulates emotional behavior and central GABA receptor expression in a mouse via the vagus nerve. In: Proceedings of the National Academy of Sciences, 201102999 (2011)
Zhang, X., Yin, L., Cohn, J.F., Canavan, S., Reale, M., Horowitz, A., Girard, J.M.: Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image Vis. Comput. 32(10), 692–706 (2014)
Duan, H., Shao, X., Hou, W., He, G., Zeng, Q.: An incremental learning algorithm for Lagrangian support vector machines. Pattern Recogn. Lett. 30(15), 1384–1391 (2009)
Jiang, B., Valstar, M. F., Pantic, M.: Action unit detection using sparse appearance descriptors in space- time video volumes. In: 2011 IEEE International Conference on Automatic Face Gesture Recognition and Workshops (FG 2011), pp. 314–321. IEEE (2011, March)
Tang, C., Zheng, W., Yan, J., Li, Q., Li, Y., Zhang, T., Cui, Z.: View-independent facial action unit detec- tion. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 878–882. IEEE (2017, May)
Zhao, K., Chu, W. S., De la Torre, F., Cohn, J. F., Zhang, H.: Joint patch and multi-label learning for facial action unit detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2207–2216 (2015)
Zhao, K., Chu, W. S., Zhang, H.: Deep region and multi-label learning for facial action unit detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3391–3399 (2016)
Taigman, Y., Yang, M., Ranzato, M. A., Wolf, L.: Deepface: Closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)
Romero, A., Leon, J., Arbelaez, P.: Multi-View Dynamic Facial Action Unit Detection, Image and Vision Computing (2018)
Shao, Z., Liu, Z., Cai, J., Wu, Y., Ma, L.: Facial action unit detection using attention and relation learning. IEEE Transactions on Affective Computing (2019)
Corneanu, C. A., Madadi, M., Escalera, S.: Deep structure inference network for facial action unit recognition. In: European Conference on Computer Vision. Springer, pp. 309–324 (2018)
De la Torre, F., Simon, T., Ambadar, Z., Cohn, J. F.: Fast-FACS: A computer-assisted system to increase speed and reliability of manual FACS coding. In: International Conference on Affective Computing and Intelligent Interaction, pp. 57–66. Springer, Berlin, Heidelberg (2011, October)
Zeng, J., Chu, W.S., De la Torre, F., Cohn, J.F., Xiong, Z.: Confidence preserving machine for facial action unit detection. In: IEEE International Conference on Computer Vision, pp. 3622–3630. IEEE (2015)
Rudovic, O., Pavlovic, V., Pantic, M., (2012) Kernel Conditional Ordinal Random Fields for Temporal Segmentation of Facial Action Units. In: Fusiello A., Murino V., Cucchiara R. (eds) Computer Vision - ECCV 2012. Workshops and Demonstrations. ECCV, : Lecture Notes in Computer Science, vol. 7584. Springer, Berlin, Heidelberg (2012)
Shao, Z., Liu, Z., Cai, J., Wu, Y., Ma, L.: Weakly-Supervised Attention and Relation Learningfor Facial Action Unit Detection. IEEE Transactions on Affective Computing (2018)
Jaiswal, S., Valstar, M.: Deep learning the dynamic appearance and shape of facial action units. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016, March)
Li, W., Abtahi, F., Zhu, Z.: Action unit detection with region adaptation, multi-labeling learning and op- timal temporal fusing. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6766–6775. IEEE (2017, July)
Li, W., Abtahi, F., Zhu, Z., Yin, L.: Eac-net: A region-based deep enhancing and cropping approach for facial action unit detection (2017). arXiv preprint arXiv:1702.02925
Valstar, M.F., Pantic, M.: Fully automatic recognition of the temporal phases of facial actions. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 42(1), 28–43 (2012)
Pei, W., Dibekliolu, H., Tax, D.M., van der Maaten, L.: Multivariate time-series classification using the hidden- unit logistic model. IEEE Trans. Neural Netw. Learn. Syst. 29(4), 920–931 (2018)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 (Vol. 1, pp. I-I). IEEE (2001, December)
Zhang, Z., Zhai, S., Yin, L.: Identity-based Adversarial Training of Deep CNNs for Facial Action Unit Recognition. In: BMVC, p. 226 (2018, September)
Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M. (2014). Incremental face alignment in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition(pp. 1859-1866)
Davis, J. W., Bobick, A. F.: The repre- sentation and recognition of human movement using temporal templates. In: 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1997. Proceedings., pp. 928–934. IEEE (1997, June)
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohnkanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101. IEEE (2010, June)
Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: Disfa: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Zhong, L., Liu, Q., Yang, P., Huang, J., Metaxas, D.N.: Learning multiscale active facial patches for expression analysis. IEEE Trans. Cybern. 45(8), 1499–1510 (2014)
Zhi, R., Liu, M., Zhang, D.: A comprehensive survey on automatic facial action unit analysis. Vis. Comput. 36(5), 1067–1093 (2020)
Martinez, B., Valstar, M. F., Jiang, B., Pantic, M.: Automatic analysis of facial actions: a survey. IEEE transactions on affective computing (2017)
Sumathi, C.P., Santhanam, T., Mahadevi, M.: Automatic facial expression analysis a survey. Int. J. Comput. Sci. Eng. Surv. 3(6), 47 (2012)
Li, G., Zhu, X., Zeng, Y., Wang, Q., Lin, L.: Semantic relationships guided representation learning for facial action unit recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 8594–8601) (2019, July)
Liu, Z., Dong, J., Zhang, C., Wang, L., Dang, J.: Relation modeling with graph convolutional networks for facial action unit detection. In: International Conference on Multimedia Modeling, pp. 489–501. Springer, Cham (2020, January)
Shao, Z., Liu, Z., Cai, J., Ma, L.: Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 705–720 (2018)
Chu, W. S., De la Torre, F., Cohn, J. F.: Modeling spatial and temporal cues for multi-label facial action unit detection (2016). arXiv preprint arXiv:1608.00911
Song, T., Chen, L., Zheng, W., Ji, Q.: Uncertain Graph Neural Networks for Facial Action Unit Detection. (AAAI 2021) (2021)
Cui, Z., Song, T., Wang, Y., Ji, Q.: Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition. Advances in Neural Information Processing Systems, 33. (NeurIPS 2020) (2020)
Huang, Y., Qing, L., Xu, S., Wang, L., Peng, Y.: HybNet: a hybrid network structure for pain intensity estimation. Vis. Comput. 2021, 1–12 (2021)
Joseph, A., Geetha, P.: Facial emotion detection using modified eyemap-mouthmap algorithm on an enhanced image and classification with tensorflow. Vis. Comput. 36(3), 529–539 (2020)
Vinolin, V., Sucharitha, M.: Dual adaptive deep convolutional neural network for video forgery detection in 3D lighting environment. The Visual Computer, pp. 1–22 (2020)
Zhu, X., Chen, Z.: Dual-modality spatiotemporal feature learning for spontaneous facial expression recognition in e-learning using hybrid deep neural network. Vis. Comput. 2019, 1–13 (2019)
Danelakis, A., Theoharis, T., Pratikakis, I.: A robust spatio-temporal scheme for dynamic 3D facial expression retrieval. Vis. Comput. 32(2), 257–269 (2016)
Funding
This work is supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under Grant No. 115E310.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Akay, S., Arica, N. Stacking multiple cues for facial action unit detection. Vis Comput 38, 4235–4250 (2022). https://doi.org/10.1007/s00371-021-02291-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02291-3