Skip to main content
Log in

Lightweight network architecture using difference saliency maps for facial action unit detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Facial action unit (AU) detection has been applied in a wild range of fields, and has attracted great attention over the last decades. Most existing methods employ the predefined regions of interest with same number and range for all samples. However, we find that the flexibility of predefined regions of interest is finite, as the occurrence of different AUs may not be simultaneous and their ranges change with intensity changes. In addition, many AU detection works try to independently design feature extraction modules and classifiers for each AU, which is of high computation cost and ignores the dependency among different AUs. In view of the limited flexibility of predefined regions of interest, we propose difference saliency maps that do not depend on facial landmarks. They are the spatial pixel-wise attentions, where each element represents the importance of the corresponding pixel on the entire image. Therefore, all the regions of interest can be irregular. In addition, in order to solve the problem of high computation cost, we combine group convolution with skip connection to propose a lightweight network that is more suitable for AU detection. All AUs share features and there is only one classifier, so the computation cost and the number of parameters are greatly reduced. In particular, the difference saliency maps and the global feature maps are combined to obtain the regional enhancement features. To maximize the enhancement effect, the down-sampled difference saliency maps are added to multiple blocks of the lightweight network. The enhanced global features are directly sent to the classifier for AU detection. By changing the number of neurons in the classifier, our framework can easily adapt to different datasets. Extensive experimental results show that the proposed framework soundly outperforms the classic deep learning method when evaluated on the DISFA+ and CK+ datasets. After adding the difference saliency maps, the detection result is better than the state-of-the-art AU detection methods. Further experiments demonstrate that our network is more efficient in using parameters, computation complexity and inference time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Benitez-Quiroz CF, Srinivasan R, Martinez AM (2016) Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 5562–5570. https://doi.org/10.1109/CVPR.2016.600

  2. Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on image and video retrieval, CIVR ’07, pp 401–408, Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1282280.1282340

  3. Corneanu C, Madadi M, Escalera S (2018) Deep structure inference network for facial action unit recognition. In: Proceedings of the european conference on computer vision (ECCV), pp. 298–313

  4. Eleftheriadis S, Rudovic O, Pantic M (2015) Multi-conditional latent variable model for joint facial action unit detection. In: 2015 IEEE international conference on computer vision (ICCV), pp 3792–3800. https://doi.org/10.1109/ICCV.2015.432

  5. Friesen E, Ekman P (1978) Facial action coding system: a technique for the measurement of facial movement. Palo Alto 3

  6. Friesen E, Ekman P (2002) Facial action coding system(facs). A human face

  7. Gupta V, Raman S (2017) Automatic trimap generation for image matting. In: 2016 International conference on signal and information processing (IConSIP)

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  9. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372

    Article  Google Scholar 

  10. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  11. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456

  12. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. https://doi.org/10.1109/34.730558

    Article  Google Scholar 

  13. Jaiswal S, Valstar M (2016) Deep learning the dynamic appearance and shape of facial action units. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp 1–8. https://doi.org/10.1109/WACV.2016.7477625

  14. King DE (2009) Dlib-ml: A machine learning toolkit. J Mach Learn Res 10:1755–1758

    Google Scholar 

  15. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: International conference on learning representations (ICLR)

  16. Li W, Abtahi F, Zhu Z (2017) Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6766–6775. https://doi.org/10.1109/CVPR.2017.716

  17. Li W, Abtahi F, Zhu Z, Yin L (2017) Eac-net: A region-based deep enhancing and cropping approach for facial action unit detection. In: 2017 12th IEEE international conference on automatic face gesture recognition (FG 2017), pp 103–110. https://doi.org/10.1109/FG.2017.136

  18. Li W, Abtahi F, Zhu Z, Yin L (2018) Eac-net: Deep nets with enhancing and cropping for facial action unit detection. IEEE Trans Pattern Anal Mach Intell 40(11):2583–2596. https://doi.org/10.1109/TPAMI.2018.2791608

    Article  Google Scholar 

  19. Liu M, Yan X, Wang C, Wang K (2021) Segmentation mask-guided person image generation. Appl Intell 51(2):1161–1176. https://doi.org/10.1007/s10489-020-01907-w

    Article  Google Scholar 

  20. Liu Z, Dong J, Zhang C, Wang L, Dang J (2020) Relation modeling with graph convolutional networks for facial action unit detection. In: International conference on multimedia modeling. Springer, pp 489–501

  21. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: 2010 ieee computer society conference on computer vision and pattern recognition-workshops. IEEE, pp 94–101

  22. Ma C, Chen L, Yong J (2019) Au r-cnn: Encoding expert prior knowledge into r-cnn for action unit detection. Neurocomputing 355:35–47. https://doi.org/10.1016/j.neucom.2019.03.082

    Article  Google Scholar 

  23. Mavadati M, Sanger P, Mahoor MH (2016) Extended disfa dataset: Investigating posed and spontaneous facial expressions. In: 2016 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 1452–1459. https://doi.org/10.1109/CVPRW.2016.182

  24. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623

    Article  Google Scholar 

  25. Shao Z, Liu Z, Cai J, Ma L (2018) Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the european conference on computer vision (ECCV), pp 705–720. DOI10.1007/978-3-030-01261-8_43

  26. Shao Z, Liu Z, Cai J, Ma L (2021) JÂA-Net: joint facial action unit detection and face alignment via adaptive attention. Int J Comput Vis 129(2):321–340. https://doi.org/10.1007/s11263-020-01378-z

    Article  Google Scholar 

  27. Shao Z, Liu Z, Cai J, Wu Y, Ma L (2019) Facial action unit detection using attention and relation learning. IEEE Trans Affect Comput 1–1. https://doi.org/10.1109/TAFFC.2019.2948635

  28. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683

    Article  Google Scholar 

  29. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. arXiv:1409.1556

  30. Song Y, McDuff D, Vasisht D, Kapoor A (2015) Exploiting sparsity and co-occurrence structure for action unit recognition. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 1, pp 1–8. https://doi.org/10.1109/FG.2015.7163081

  31. Song Z, Sui H, Hua L (2021) A hierarchical object detection method in large-scale optical remote sensing satellite imagery using saliency detection and CNN. Int J Remote Sens 42(8):2827–2847. https://doi.org/10.1080/01431161.2020.1826059

    Article  Google Scholar 

  32. Valstar MF, Almaev T, Girard JM, McKeown G, Mehu M, Yin L, Pantic M, Cohn JF (2015) Fera 2015 - second facial expression recognition and analysis challenge. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 06, pp 1–8. https://doi.org/10.1109/FG.2015.7284874https://doi.org/10.1109/FG.2015.7284874

  33. Wang B, Chen Q, Zhou M, Zhang Z, Gai K (2020) Progressive feature polishing network for salient object detection. Proc AAAI Conf Artif Intell 34(7):12,128–12,135

    Google Scholar 

  34. Wang S, Wu S, Peng G, Ji Q (2019) Capturing feature and label relations simultaneously for multiple facial action unit recognition. IEEE Trans Affect Comput 10(3):348–359. https://doi.org/10.1109/TAFFC.2017.2737540

    Article  Google Scholar 

  35. Wang SJ, Lin B, Wang Y, Yi T, Zou B, wen Lyu X (2019) Action units recognition based on deep spatial-convolutional and multi-label residual network. Neurocomputing 359:130–138. https://doi.org/10.1016/j.neucom.2019.05.018

    Article  Google Scholar 

  36. Wang Z, Li Y, Wang S, Ji Q (2013) Capturing global semantic relationships for facial action unit recognition. In: 2013 IEEE international conference on computer vision, pp 3304–3311. https://doi.org/10.1109/ICCV.2013.410

  37. Zhang T, Qi G, Xiao B, Wang J (2017) Interleaved group convolutions. In: 2017 IEEE International conference on computer vision (ICCV), pp 4383–4392. https://doi.org/10.1109/ICCV.2017.469

  38. Zhao K, Chu W, Zhang H (2016) Deep region and multi-label learning for facial action unit detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3391–3399. https://doi.org/10.1109/CVPR.2016.369

  39. Zhao K, Chu WS, De la Torre F, Cohn JF, Zhang H (2016) Joint patch and multi-label learning for facial action unit and holistic expression recognition. IEEE Trans Image Process 25(8):3931–3946. https://doi.org/10.1109/TIP.2016.2570550

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kejun Wang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Wang, C., Wang, K. et al. Lightweight network architecture using difference saliency maps for facial action unit detection. Appl Intell 52, 6354–6375 (2022). https://doi.org/10.1007/s10489-021-02755-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02755-y

Keywords

Navigation