Lightweight network architecture using difference saliency maps for facial action unit detection

Chen, Jing; Wang, Chenhui; Wang, Kejun; Liu, Meichen

doi:10.1007/s10489-021-02755-y

Lightweight network architecture using difference saliency maps for facial action unit detection

Published: 08 September 2021

Volume 52, pages 6354–6375, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jing Chen¹,
Chenhui Wang²,
Kejun Wang ORCID: orcid.org/0000-0002-2912-8994¹ &
…
Meichen Liu¹

448 Accesses
4 Citations
Explore all metrics

Abstract

Facial action unit (AU) detection has been applied in a wild range of fields, and has attracted great attention over the last decades. Most existing methods employ the predefined regions of interest with same number and range for all samples. However, we find that the flexibility of predefined regions of interest is finite, as the occurrence of different AUs may not be simultaneous and their ranges change with intensity changes. In addition, many AU detection works try to independently design feature extraction modules and classifiers for each AU, which is of high computation cost and ignores the dependency among different AUs. In view of the limited flexibility of predefined regions of interest, we propose difference saliency maps that do not depend on facial landmarks. They are the spatial pixel-wise attentions, where each element represents the importance of the corresponding pixel on the entire image. Therefore, all the regions of interest can be irregular. In addition, in order to solve the problem of high computation cost, we combine group convolution with skip connection to propose a lightweight network that is more suitable for AU detection. All AUs share features and there is only one classifier, so the computation cost and the number of parameters are greatly reduced. In particular, the difference saliency maps and the global feature maps are combined to obtain the regional enhancement features. To maximize the enhancement effect, the down-sampled difference saliency maps are added to multiple blocks of the lightweight network. The enhanced global features are directly sent to the classifier for AU detection. By changing the number of neurons in the classifier, our framework can easily adapt to different datasets. Extensive experimental results show that the proposed framework soundly outperforms the classic deep learning method when evaluated on the DISFA+ and CK+ datasets. After adding the difference saliency maps, the detection result is better than the state-of-the-art AU detection methods. Further experiments demonstrate that our network is more efficient in using parameters, computation complexity and inference time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Fig. 8

Fig. 10

Fig. 11

CBAM: Convolutional Block Attention Module

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Deepfake: An Overview

References

Benitez-Quiroz CF, Srinivasan R, Martinez AM (2016) Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 5562–5570. https://doi.org/10.1109/CVPR.2016.600
Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on image and video retrieval, CIVR ’07, pp 401–408, Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1282280.1282340
Corneanu C, Madadi M, Escalera S (2018) Deep structure inference network for facial action unit recognition. In: Proceedings of the european conference on computer vision (ECCV), pp. 298–313
Eleftheriadis S, Rudovic O, Pantic M (2015) Multi-conditional latent variable model for joint facial action unit detection. In: 2015 IEEE international conference on computer vision (ICCV), pp 3792–3800. https://doi.org/10.1109/ICCV.2015.432
Friesen E, Ekman P (1978) Facial action coding system: a technique for the measurement of facial movement. Palo Alto 3
Friesen E, Ekman P (2002) Facial action coding system(facs). A human face
Gupta V, Raman S (2017) Automatic trimap generation for image matting. In: 2016 International conference on signal and information processing (IConSIP)
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. https://doi.org/10.1109/34.730558
Article Google Scholar
Jaiswal S, Valstar M (2016) Deep learning the dynamic appearance and shape of facial action units. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp 1–8. https://doi.org/10.1109/WACV.2016.7477625
King DE (2009) Dlib-ml: A machine learning toolkit. J Mach Learn Res 10:1755–1758
Google Scholar
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International conference on learning representations (ICLR)
Li W, Abtahi F, Zhu Z (2017) Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6766–6775. https://doi.org/10.1109/CVPR.2017.716
Li W, Abtahi F, Zhu Z, Yin L (2017) Eac-net: A region-based deep enhancing and cropping approach for facial action unit detection. In: 2017 12th IEEE international conference on automatic face gesture recognition (FG 2017), pp 103–110. https://doi.org/10.1109/FG.2017.136
Li W, Abtahi F, Zhu Z, Yin L (2018) Eac-net: Deep nets with enhancing and cropping for facial action unit detection. IEEE Trans Pattern Anal Mach Intell 40(11):2583–2596. https://doi.org/10.1109/TPAMI.2018.2791608
Article Google Scholar
Liu M, Yan X, Wang C, Wang K (2021) Segmentation mask-guided person image generation. Appl Intell 51(2):1161–1176. https://doi.org/10.1007/s10489-020-01907-w
Article Google Scholar
Liu Z, Dong J, Zhang C, Wang L, Dang J (2020) Relation modeling with graph convolutional networks for facial action unit detection. In: International conference on multimedia modeling. Springer, pp 489–501
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 ieee computer society conference on computer vision and pattern recognition-workshops. IEEE, pp 94–101
Ma C, Chen L, Yong J (2019) Au r-cnn: Encoding expert prior knowledge into r-cnn for action unit detection. Neurocomputing 355:35–47. https://doi.org/10.1016/j.neucom.2019.03.082
Article Google Scholar
Mavadati M, Sanger P, Mahoor MH (2016) Extended disfa dataset: Investigating posed and spontaneous facial expressions. In: 2016 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 1452–1459. https://doi.org/10.1109/CVPRW.2016.182
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623
Article Google Scholar
Shao Z, Liu Z, Cai J, Ma L (2018) Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the european conference on computer vision (ECCV), pp 705–720. DOI10.1007/978-3-030-01261-8_43
Shao Z, Liu Z, Cai J, Ma L (2021) JÂA-Net: joint facial action unit detection and face alignment via adaptive attention. Int J Comput Vis 129(2):321–340. https://doi.org/10.1007/s11263-020-01378-z
Article Google Scholar
Shao Z, Liu Z, Cai J, Wu Y, Ma L (2019) Facial action unit detection using attention and relation learning. IEEE Trans Affect Comput 1–1. https://doi.org/10.1109/TAFFC.2019.2948635
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. arXiv:1409.1556
Song Y, McDuff D, Vasisht D, Kapoor A (2015) Exploiting sparsity and co-occurrence structure for action unit recognition. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 1, pp 1–8. https://doi.org/10.1109/FG.2015.7163081
Song Z, Sui H, Hua L (2021) A hierarchical object detection method in large-scale optical remote sensing satellite imagery using saliency detection and CNN. Int J Remote Sens 42(8):2827–2847. https://doi.org/10.1080/01431161.2020.1826059
Article Google Scholar
Valstar MF, Almaev T, Girard JM, McKeown G, Mehu M, Yin L, Pantic M, Cohn JF (2015) Fera 2015 - second facial expression recognition and analysis challenge. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 06, pp 1–8. https://doi.org/10.1109/FG.2015.7284874 https://doi.org/10.1109/FG.2015.7284874
Wang B, Chen Q, Zhou M, Zhang Z, Gai K (2020) Progressive feature polishing network for salient object detection. Proc AAAI Conf Artif Intell 34(7):12,128–12,135
Google Scholar
Wang S, Wu S, Peng G, Ji Q (2019) Capturing feature and label relations simultaneously for multiple facial action unit recognition. IEEE Trans Affect Comput 10(3):348–359. https://doi.org/10.1109/TAFFC.2017.2737540
Article Google Scholar
Wang SJ, Lin B, Wang Y, Yi T, Zou B, wen Lyu X (2019) Action units recognition based on deep spatial-convolutional and multi-label residual network. Neurocomputing 359:130–138. https://doi.org/10.1016/j.neucom.2019.05.018
Article Google Scholar
Wang Z, Li Y, Wang S, Ji Q (2013) Capturing global semantic relationships for facial action unit recognition. In: 2013 IEEE international conference on computer vision, pp 3304–3311. https://doi.org/10.1109/ICCV.2013.410
Zhang T, Qi G, Xiao B, Wang J (2017) Interleaved group convolutions. In: 2017 IEEE International conference on computer vision (ICCV), pp 4383–4392. https://doi.org/10.1109/ICCV.2017.469
Zhao K, Chu W, Zhang H (2016) Deep region and multi-label learning for facial action unit detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3391–3399. https://doi.org/10.1109/CVPR.2016.369
Zhao K, Chu WS, De la Torre F, Cohn JF, Zhang H (2016) Joint patch and multi-label learning for facial action unit and holistic expression recognition. IEEE Trans Image Process 25(8):3931–3946. https://doi.org/10.1109/TIP.2016.2570550
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
Jing Chen, Kejun Wang & Meichen Liu
Department of Statistics, UCLA, Los Angeles, USA
Chenhui Wang

Authors

Jing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chenhui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kejun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Meichen Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kejun Wang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Wang, C., Wang, K. et al. Lightweight network architecture using difference saliency maps for facial action unit detection. Appl Intell 52, 6354–6375 (2022). https://doi.org/10.1007/s10489-021-02755-y

Download citation

Accepted: 09 August 2021
Published: 08 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10489-021-02755-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lightweight network architecture using difference saliency maps for facial action unit detection

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Facial emotion recognition using convolutional neural networks (FERC)

Deepfake: An Overview

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lightweight network architecture using difference saliency maps for facial action unit detection

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Facial emotion recognition using convolutional neural networks (FERC)

Deepfake: An Overview

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation