Abstract
Correlations in facial action units (AUs) convey significant information for AU detection yet have not been thoroughly exploited. Most existing methods learn the regional correlation distribution of each AU, or reason the dependencies among AUs. However, these methods typically either predefine the correlations based on prior knowledge, which often ignores useful information, or directly learn the correlations guided by AU detection, which often includes irrelevant information. To resolve these limitations, we propose a novel hybrid relational reasoning framework for AU detection. In particular, we propose to adaptively reason pixel-level correlations of each AU, under the constraint of predefined regional correlations by facial landmarks, as well as the supervision of AU detection. Moreover, we propose to adaptively reason AU-level correlations using a graph convolutional network, by considering both predefined AU relationships and learnable relationship weights. Our framework is beneficial for integrating the advantages of correlation predefinition and correlation learning. Extensive experiments demonstrate that our approach (i) soundly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, and GFT benchmarks, and (ii) can precisely reason the regional correlation distribution of each AU.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: The 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194 (1999)
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: A dataset for recognising faces across pose and age. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp. 67–74. IEEE (2018)
Chen, Y., Song, G., Shao, Z., Cai, J., Cham, T.J., Zheng, J.: Geoconv: geodesic guided convolution for facial action unit recognition. Pattern Recognit. 122, 108355 (2022)
Chu, W.S., De la Torre, F., Cohn, J.F.: Learning spatial and temporal cues for multi-label facial action unit detection. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp. 25–32. IEEE (2017)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Corneanu, C. A., Madadi, M., Escalera, S.: Deep structure inference network for facial action unit recognition. In: European Conference on Computer Vision, pp. 309–324. Springer (2018)
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, pp. 3837–3845. Curran Associates, Inc. (2016)
Ekman, P., Friesen, W.V.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto (1978)
Ekman, P., Friesen, W.V., Hager, J.C.: Facial action coding system. Research Nexus (2002)
Ertugrul, I.O., Cohn, J.F., Jeni, L.A., Zhang, Z., Yin, L., Ji, Q.: Crossing domains for au coding: perspectives, approaches, and measures. IEEE Trans. Biom. Behav. Identity Sci. 2(2), 158–171 (2020)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Girard, J.M., Chu, W.S., Jeni, L.A., Cohn, J.F.: Sayette group formation task (GFT) spontaneous facial expression database. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 581–588. IEEE (2017)
Guo, Y., Zhang, J., Cai, J., Jiang, B., Zheng, J.: Cnn-based real-time dense face reconstruction with inverse-rendered photo-realistic face images. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1294–1307 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Jeni, L.A., Cohn, J.F., Kanade, T.: Dense 3d face alignment from 2d video for real-time use. Image Vis. Comput. 58, 13–24 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105. Curran Associates, Inc. (2012)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Li, G., Zhu, X., Zeng, Y., Wang, Q., Lin, L.: Semantic relationships guided representation learning for facial action unit recognition. In: AAAI Conference on Artificial Intelligence, pp. 8594–8601 (2019)
Li, W., Abtahi, F., Zhu, Z., Yin, L.: Eac-net: deep nets with enhancing and cropping for facial action unit detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(11), 2583–2596 (2018)
Li, Y., Wang, S., Zhao, Y., Ji, Q.: Simultaneous facial feature tracking and facial expression recognition. IEEE Trans. Image Process. 22(7), 2559–2573 (2013)
Li, Y., Zeng, J., Shan, S., Chen, X.: Self-supervised representation learning from videos for facial action unit detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10924–10933. IEEE (2019)
Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations (2014)
Liu, Z., Dong, J., Zhang, C., Wang, L., Dang, J.: Relation modeling with graph convolutional networks for facial action unit detection. In: International Conference on Multimedia Modeling, pp. 489–501. Springer (2020)
Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision, pp. 1150–1157. IEEE (1999)
Ma, C., Chen, L., Yong, J.: Au r-cnn: Encoding expert prior knowledge into r-cnn for action unit detection. Neurocomputing 355, 35–47 (2019)
Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: Disfa: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)
Niu, X., Han, H., Yang, S., Huang, Y., Shan, S.: Local relationship learning with person-specific shape regularization for facial action unit detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11917–11926 (2019)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035. Curran Associates, Inc. (2019)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Burlington (1988)
Sankaran, N., Mohan, D.D., Setlur, S., Govindaraju, V., Fedorishin, D.: Representation learning through cross-modality supervision. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–8. IEEE (2019)
Shao, Z., Liu, Z., Cai, J., Ma, L.: Jâa-net: joint facial action unit detection and face alignment via adaptive attention. Int. J. Comput. Vis. 129(2), 321–340 (2021)
Shao, Z., Liu, Z., Cai, J., Wu, Y., Ma, L.: Facial action unit detection using attention and relation learning. IEEE Trans. Affect. Comput. (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Song, T., Chen, L., Zheng, W., Ji, Q.: Uncertain graph neural networks for facial action unit detection. In: AAAI Conference on Artificial Intelligence, pp. 5993–6001 (2021)
Song, T., Cui, Z., Zheng, W., Ji, Q.: Hybrid message passing with performance-driven structures for facial action unit detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6267–6276. IEEE (2021)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147 (2013)
Tong, Y., Ji, Q.: Learning Bayesian networks with qualitative constraints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Wang, Z., Li, Y., Wang, S., Ji, Q.: Capturing global semantic relationships for facial action unit recognition. In: IEEE International Conference on Computer Vision, pp. 3304–3311. IEEE (2013)
Xiong, X., De, la, Torre, F.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539. IEEE (2013)
Zhang, X., Yin, L., Cohn, J.F., Canavan, S., Reale, M., Horowitz, A., Liu, P., Girard, J.M.: Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image Vis. Comput. 32(10), 692–706 (2014)
Zhao, K., Chu, W.S., De la Torre, F., Cohn, J.F., Zhang, H.: Joint patch and multi-label learning for facial action unit and holistic expression recognition. IEEE Trans. Image Process. 25(8), 3931–3946 (2016)
Zhao, K., Chu, W.S., Zhang, H.: Deep region and multi-label learning for facial action unit detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3391–3399. IEEE (2016)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 62106268), and the High-Level Talent Program for Innovation and Entrepreneurship (ShuangChuang Doctor) of Jiangsu Province (No. JSSCBS20211220). It was also partially supported by the National Natural Science Foundation of China (No. 62101555 and No. 62002360), the Natural Science Foundation of Jiangsu Province (No. BK20201346 and No. BK20210488), and the Fundamental Research Funds for the Central Universities (No. 2021QN+1072).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shao, Z., Zhou, Y., Liu, B. et al. Facial action unit detection via hybrid relational reasoning. Vis Comput 38, 3045–3057 (2022). https://doi.org/10.1007/s00371-022-02527-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02527-w