Skip to main content

Advertisement

Log in

Facial action unit detection methodology with application in Brazilian sign language recognition

  • Original Article
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Sign Language is the linguistic system adopted by the Deaf to communicate. The lack of fully-fledged Automatic Sign Language (ASLR) technologies contributes to the numerous difficulties that deaf individuals face in the absence of an interpreter, such as in private health appointments or in emergency situations. A challenging problem in the development of reliable ASLR systems is that sign languages do not rely only on manual gestures but also on facial expressions and other non-manual markers. This paper proposes to adopt Facial Action Coding System to encode sign language facial expressions. However, the state-of-the-art of Action Unit (AU) recognition models is mostly targeted to classify two dozen of AUs, typically related to the expression of emotions. We adopted Brazilian Sign Language (Libras) as our case study and we identified more than one hundred of AUs (with a great intersection with other sign languages). We then implemented and evaluated a novel AU recognition model architecture that combines SqueezeNet and geometric-based features. Our model obtained 88% of accuracy for 119 classes. Combined with the state-of-the-art of gesture recognition, our model is ready to improve sign disambiguation and to advance ASLR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Code available at https://github.com/SrtaEmely/AUdetectionForLibras.

References

  1. Araujo ADSD (2013) As expressões e as marcas não-manuais na língua de sinais brasileira. Universidade de Brasília (UnB). Brasília, Masters dissertation

  2. Baltrusaitis T, Zadeh A, Lim YC, Morency LP (2018) Openface 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, pp 59–66

  3. Batista JC, Albiero V, Bellon OR, Silva L (2017) Aumpnet: simultaneous action units detection and intensity estimation on multipose facial images using a single convolutional neural network. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 866–871

  4. Benitez-Quiroz CF, Srinivasan R, Feng Q, Wang Y, Martinez AM (2017) Emotionet challenge: Recognition of facial expressions of emotion in the wild

  5. Brazil (2002) Decree-law no.10.436, of 24 April 2002. http://www.planalto.gov.br/ccivil_03/leis/2002/l10436.htm. Accessed 20 Jul 2020

  6. Caridakis G, Asteriadis S, Karpouzis K (2014) Non-manual cues in automatic sign language recognition. Pers Ubiquitous Comput 18(1):37–46

    Article  Google Scholar 

  7. Chen Y, Wang J, Chen S, Shi Z, Cai J (2019) Facial motion prior networks for facial expression recognition. In: 2019 IEEE visual communications and image processing (VCIP). IEEE, pp 1–4

  8. Chollet F et al (2018) Keras: the python deep learning library. Astrophysics Source Code Library. record ascl:1806.022

    Google Scholar 

  9. Chu WS, De la Torre F, Cohn JF (2017) Learning spatial and temporal cues for multi-label facial action unit detection. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 25–32

  10. Chu WS, De la Torre F, Cohn JF (2019) Learning facial action units with spatiotemporal cues and multi-label sampling. Image Vis Comput 81:1–14

    Article  Google Scholar 

  11. Silva EP, Costa PDP (2017) Qlibras: a novel database for grammatical facial expressions in brazilian sign language. In: Proceeding of the X Meeting of Students and Teachers of DCA/FEEC/UNICAMP (EADCA)

  12. Dachkovsky S, Sandler W (2009) Visual intonation in the prosody of a sign language. Lang speech 52(2–3):287–314. https://doi.org/10.1177/0023830909103175

    Article  Google Scholar 

  13. De Martino JM, Silva IR, Bolognini CZ, Costa PDP, Kumada KMO, Coradine LC, Brito PHS, do Amaral WM, Benetti ÂB, Poeta ET, Angare LMG, Ferreira CM, De Conti DF (2017) Signing avatars: making education more inclusive. Univers Access in the Inf Soc 16(3):793–808. https://doi.org/10.1007/s10209-016-0504-x

    Article  Google Scholar 

  14. De Vos C, Van Der Kooij E, Crasborn O (2009) Mixed signals: combining linguistic and affective functions of eyebrows in questions in sign language of The Netherlands. Lang Speech 52(2–3):315–339. https://doi.org/10.1177/0023830909103177

    Article  Google Scholar 

  15. dos Santos TS, Xavier AN (2019) Recursos manuais e não-manuais na expressão de intensidade em libras. Leitura 2(63):120–137

    Article  Google Scholar 

  16. Du S, Tao Y, Martinez AM (2014) Compound facial expressions of emotion. Proceedings of the National Academy of Sciences 111(15):E1454–E1462. https://doi.org/10.1073/pnas.1322355111

    Article  Google Scholar 

  17. Dubbaka A, Gopalan A (2020) Detecting learner engagement in MOOCs using automatic facial expression recognition. In: 2020 IEEE global engineering education conference (EDUCON). IEEE, pp 447–456

  18. Ekman P (1993) Facial expression and emotion. Am Psychol 48(4):384

    Article  Google Scholar 

  19. Ekman P, Friesen WV (1978) Manual for the facial action coding system. Consulting Psychologists Press, Palo Alto, CA

    Google Scholar 

  20. Freitas FA, Pere SM, Lima CA, Barbosa FV (2014) Grammatical facial expressions recognition with machine learning. In: The Twenty-seventh international FLAIRS conference (FLAIRS-27). Pensacola Beach, Florida.

  21. Ghosh S, Laksana E, Scherer S, Morency LP (2015) A multi-label convolutional neural network approach to cross-domain action unit detection. In: 2015 international conference on affective computing and intelligent interaction (ACII). IEEE, pp 609–615

  22. Gudi A, Tasli HE, Den Uyl TM, Maroulis A (2015) Deep learning based FACS action unit occurrence and intensity estimation. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 6. IEEE, pp 1–5

  23. Han S, Meng Z, Li Z, O’Reilly J, Cai J, Wang X, Tong Y (2018) Optimizing filter size in convolutional neural networks for facial action unit recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5070–5078

  24. Hao L, Wang S, Peng G, Ji Q (2018) Facial action unit recognition augmented by their dependencies. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, pp 187–194

  25. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size. arXiv preprint arXiv:1602.07360

  26. Itseez G (2015) Open source computer vision library. https://github.com/itseez/opencv. Accessed 20 Jul 2020

  27. Jia X, Liu S, Powers D, Cardiff B (2017) A multi-layer fusion-based facial expression recognition approach with optimal weighted AUs. Appl Sci 7(2):112. https://doi.org/10.3390/app7020112

    Article  Google Scholar 

  28. Jiang B, Valstar MF, Martinez B, Pantic M (2014) A dynamic appearance descriptor approach to facial actions temporal modeling. IEEE Transactions Cybern 44(2):161–174. https://doi.org/10.1109/TCYB.2013.2249063

    Article  Google Scholar 

  29. Kanade T, Tian Y, Cohn JF (2000) Comprehensive database for facial expression analysis. In: Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580). IEEE, p 46–53. https://doi.org/10.1109/AFGR.2000.840611

  30. Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1867–1874

  31. Kim Y, Yoo B, Kwak Y, Choi C, Kim J (2017) Deep generative-contrastive networks for facial expression recognition. arXiv preprint arXiv:1703.07140

  32. King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758

    Google Scholar 

  33. Koelstra S, Pantic M, Patras I (2010) A dynamic texture-based approach to recognition of facial actions and their temporal models. IEEE Trans Pattern Anal Mach Intell 32(11):1940–1954

    Article  Google Scholar 

  34. Kreyszig E (2011) Advanced engineering mathematics. International Edition, John Wiley & Sons, NY. 10th Edition, 1152 (ISBN: 978-0-470-64613-7)

  35. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25:1097–1105

  36. Lee M, Pavlovic V, Pantic M (2019) Fast and effective adaptation of facial action unit detection deep model. Presented at 2019 IJCAI Affective Computing Workshop. arXiv preprint arXiv:1909.12158

  37. Lei F, Liu X, Dai Q, Ling BWK (2020) Shallow convolutional neural network for image classification. SN Appli Sci 2(1):97. https://doi.org/10.1007/s42452-019-1903-4

    Article  Google Scholar 

  38. Li W, Abtahi F, Zhu Z (2017) Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1841–1850. arXiv preprint arXiv:1704.03067

  39. Li W, Abtahi F, Zhu Z, Yin L (2017) Eac-net: a region-based deep enhancing and cropping approach for facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).  IEEE, p. 103–110 arXiv preprint arXiv:1702.02925

  40. Li W, Abtahi F, Zhu Z, Yin L (2018) Eac-net: deep nets with enhancing and cropping for facial action unit detection. IEEE Trans Pattern Anal Mach Intell 40(11):2583–2596. https://doi.org/10.1109/TPAMI.2018.2791608

    Article  Google Scholar 

  41. Liu Z, Dong J, Zhang C, Wang L, Dang J (2020) Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection. In: Ro Y et al. (eds) MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science, vol. 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_40

  42. Martinez B, Valstar MF, Jiang B, Pantic M (2017) Automatic analysis of facial actions: A survey. IEEE Trans Affect Comput 10(3):325–347. https://doi.org/10.1109/TAFFC.2017.2731763

    Article  Google Scholar 

  43. Mavadati M, Sanger P, Mahoor MH (2016) Extended disfa dataset: investigating posed and spontaneous facial expressions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp 1–8

  44. Mavadati SM, Mahoor MH, Bartlett K, Trinh P, Cohn JF (2013) Disfa: a spontaneous facial action intensity database. IEEE Transactions on Affective Computing 4(2):151–160

    Article  Google Scholar 

  45. Mei C, Jiang F, Shen R, Hu Q (2018) Region and temporal dependency fusion for multi-label action unit detection. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 848–853. https://doi.org/10.1109/ICPR.2018.8545069

  46. Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: IEEE Workshop on Applications of Computer Vision (WACV). IEEE, pp 1–10. https://doi.org/10.1109/WACV.2016.7477450

  47. Ntinou, I., Sanchez, E., Bulat, A., Valstar, M., Tzimiropoulos, G. (2020) A transfer learning approach to heatmap regression for action unit intensity estimation. arXiv preprint arXiv:2004.06657

  48. Ong SC, Ranganath S (2005) Automatic sign language analysis: A survey and the future beyond lexical meaning. IEEE Trans Pattern Anal Mach Intell 6:873–891. https://doi.org/10.1109/TPAMI.2005.112

    Article  Google Scholar 

  49. Pramerdorfer C, Kampel M (2016) Facial expression recognition using convolutional neural networks: state of the art. arXiv preprint arXiv:1612.02903

  50. Rodić A, Urukalo D, Vujović M, Spasojević S, Tomić M, Berns K, Al-Darraji S, Zafar Z (2016) Embodiment of human personality with EI-robots by mapping behaviour traits from live-model, vol 540. Springer, Cham. https://doi.org/10.1007/978-3-319-49058-8_48

    Chapter  Google Scholar 

  51. Sanchez, E., Tzimiropoulos, G., Valstar, M. (2018) Joint action unit localisation and intensity estimation through heatmap regression. arXiv preprint arXiv:1805.03487

  52. Sankaran N, Mohan DD, Lakshminarayana NN, Setlur S, Govindaraju V. (2020) Domain adaptive representation learning for facial action unit recognition. Pattern Recognition, Elsevier 102:107127. https://doi.org/10.1016/j.patcog.2019.107127

  53. Savran A, Sankur B, Bilge MT (2012) Regression-based intensity estimation of facial action units. Image and Vision Computing, Elsevier 30(10):774–784. https://doi.org/10.1016/j.imavis.2011.11.008

  54. Shao Z, Liu Z, Cai J, Ma L (2018) Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European conference on computer vision (ECCV), pp 705–720

  55. Shao Z, Liu Z, Cai J, Ma L (2021) JÂA-Net: Joint Facial Action Unit Detection and Face Alignment Via Adaptive Attention. International Journal of Computer Vision, Springer 129:321–340. https://doi.org/10.1007/s11263-020-01378-z

  56. Shao Z, Liu Z, Cai J, Wu Y, Ma L (2019) Facial Action Unit Detection Using Attention and Relation Learning. In: IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2019.2948635

  57. Shao Z, Zou L, Cai J, Wu Y, Ma L (2020) Spatio-temporal relation and attention learning for facial action unit detection. arXiv preprint arXiv:2001.01168

  58. Silva EP, Costa PDP, Kumada KMO, De Martino JM (2020) Silfa: Sign language facial action database for the development of assistive technologies for the deaf. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp 382–386. https://doi.org/10.1109/FG47880.2020.00059

  59. Silva EP, Costa PDP, Kumada KMO, De Martino JM, Florentino GA (2020) August) Recognition of Affective and Grammatical Facial Expressions: A Study for Brazilian Sign Language, vol 12536. Springer, Cham, pp 218–236. https://doi.org/10.1007/978-3-030-66096-3_16

    Chapter  Google Scholar 

  60. Silva EP (2020) Facial expression recognition in Brazilian sign language using facial action coding system: Reconhecimento de expressões faciais na língua de sinais brasileira por meio do sistema de códigos de ação facial. University of Campinas, School of Electrical and Computer Engineering. Campinas, SP. Ph.D. thesis

  61. Silv EP, Costa PDP (2017) Recognition of non-manual expressions in brazilian sign language. In: 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017). IEEE, Doctoral Consortium

  62. Simard PY, Steinkraus D, Platt JC (2003) Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In Seventh International Conference on Document Analysis and Recognition (ICDAR 2003). Proceedings. Vol. 3, pp 958-958. IEEE Computer Society. https://doi.org/10.1109/ICDAR.2003.1227801

  63. Simonyan, K., & Zisserman, A. (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  64. Spitzbart A (1960) A generalization of Hermite's interpolation formula. Am Mathe Mon 67(1):42–46. https://doi.org/10.1080/00029890.1960.11989446

    Article  MathSciNet  MATH  Google Scholar 

  65. Stokoe WC (1960). Sign Language Structure. Studies in Linguistics Occasional Papers 8. Silver Spring, MD: Linstok press (Revised 1978)

  66. Sun N, Li Q, Huan R, Liu J, Han G (2019) Deep spatial-temporal feature fusion for facial expression recognition in static images. Pattern Recogn Letters, Elsevier 119:49–61. https://doi.org/10.1016/j.patrec.2017.10.022

  67. Valstar MF, Pantic M (2011) Fully automatic recognition of the temporal phases of facial actions. IEEE Trans Sys Man Cybern Part B (Cybernetics) 42(1):28–43. https://doi.org/10.1109/TSMCB.2011.2163710

    Article  Google Scholar 

  68. Velusamy S, Kannan H, Anand B, Sharma A, Navathe B (2011) A method to infer emotions from facial action units. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2028-2031. https://doi.org/10.1109/ICASSP.2011.5946910

  69. Viola P, Jones MJ (2004) Robust Real-Time Face Detection. Int J Comput Vis 57:137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb

    Article  Google Scholar 

  70. Vural E, Cetin M, Ercil A, Littlewort G, Bartlett M, Movellan J (2007) Drowsy driver detection through facial movement analysis, vol 4796. Springer, Berlin, Heidelberg, pp 6–18. https://doi.org/10.1007/978-3-540-75773-3_2

    Chapter  Google Scholar 

  71. Walecki R, Pavlovic V, Schuller B, Pantic M (2017) Deep structured learning for facial action unit intensity estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 3405–3414. https://doi.org/10.1109/CVPR.2017.605

  72. Xiong L, Karlekar J, Zhao J, Cheng Y, Xu Y, Feng J, Pranata S, Shen S (2017) A good practice towards top performance of face recognition: Transferred deep feature fusion. arXiv preprint. arXiv:1704.00438

  73. Xu X, de Sa VR (2020) Exploring multidimensional measurements for pain evaluation using facial action units. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). pp 786–792. IEEE. https://doi.org/10.1109/FG47880.2020.00087

  74. Yabunaka K, Mori Y, Toyonaga M (2018) Facial expression sequence recognition for a japanese sign language training system. In 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS). pp 1348–1353. IEEE. https://doi.org/10.1109/SCIS-ISIS.2018.00210

  75. Yang H, Ciftci U, Yin L (2018) Facial expression recognition by de-expression residue learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2168–2177. https://doi.org/10.1109/CVPR.2018.00231

  76. Yang HD, Lee SW (2013) Robust sign language recognition by combining manual and non-manual features based on conditional random field and support vector machine. Pattern Recogn Lett 34(16):2051–2056. https://doi.org/10.1016/j.patrec.2013.06.022

    Article  Google Scholar 

  77. Zhang X, Yin L, Cohn JF, Canavan S, Reale M, Horowitz A, LiuP Girard JM (2014) Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image Vis Comput 32(10):692–706. https://doi.org/10.1016/j.imavis.2014.06.002

    Article  Google Scholar 

  78. Zhao K, Chu WS, Martinez AM (2018) Learning facial action units from web images with scalable weakly supervised clustering. In Proceedings of the IEEE Conference on computer vision and pattern recognition 1:2090–2099. https://doi.org/10.1109/CVPR.2018.00223

  79. Zhi R, Liu M, Zhang D (2020) A comprehensive survey on automatic facial action unit analysis. Visual Comput 36:1067–1093. https://doi.org/10.1007/s00371-019-01707-5

    Article  Google Scholar 

  80. Zhi R, Zhou C, Li T, Liu S, Jin Y (2021) Action unit analysis enhanced facial expression recognition by deep neural network evolution. Neurocomputing 425:135–148. https://doi.org/10.1016/j.neucom.2020.03.036

    Article  Google Scholar 

  81. Zhong L, Liu Q, Yang P, Huang J, Metaxas DN (2015) Learning multiscale active facial patches for expression analysis. IEEE transactions on cybernetics 45(8):1499–1510. https://doi.org/10.1109/TCYB.2014.2354351

    Article  Google Scholar 

Download references

Acknowledgements

The research for this paper was financially supported by the National Council for the Improvement of Higher Education (CAPES)—Brazil.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emely Pujólli da Silva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

da Silva, E.P., Costa, P.D.P., Kumada, K.M.O. et al. Facial action unit detection methodology with application in Brazilian sign language recognition. Pattern Anal Applic 25, 549–565 (2022). https://doi.org/10.1007/s10044-021-01024-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-021-01024-5

Keywords

Navigation