Facial Expression Recognition Based on Deep Spatio-Temporal Attention Network

Li, Shuqin; Zheng, Xiangwei; Zhang, Xia; Chen, Xuanchi; Li, Wei

doi:10.1007/978-3-031-24386-8_28

Shuqin Li^19,20,
Xiangwei Zheng^19,20,
Xia Zhang²¹,
Xuanchi Chen^19,20 &
…
Wei Li²²

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 461))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

577 Accesses
1 Citations

Abstract

Facial expression recognition is extremely critical in the process of human-computer interaction. Existing facial expression recognition tends to focus on a single feature of the face and does not take full advantage of the integrated spatio-temporal features of facial expression images. Therefore, this paper proposes a facial expression recognition based on a deep spatio-temporal attention network (STANER) to capture the spatio-temporal features of facial expressions when they change subtly. A facial expression recognition with an attention module based on spatial global features (SGAER) is created firstly, where the addition of the attention module is able to quantify the importance of each part of the expression feature map and thus extract the spatial global appearance features at the time of subtle expression changes from a single frame expression image. Then, facial expression recognition with C-LSTM based on temporal local features (TLER) is built to process image sequences of facial regions linked to expression creation and extract dynamic local temporal information about expressions. Experiments are carried out on CK+ and Oulu-CASIA datasets. The results showed that STANER can achieve better performance with the accuracy rates of 98.23\(\%\) and 89.52\(\%\) on the two mainstream datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, L., Zhou, M., Su, W., Wu, M., She, J., Hirota, K.: Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction. Inf. Sci. 428, 49–61 (2018)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine learning 20(3), 273–297 (1995)
Google Scholar
Deng, J., Pang, G., Zhang, Z., Pang, Z., Yang, H., Yang, G.: cGAN based facial expression recognition for human-robot interaction. IEEE Access 7, 9848–9859 (2019)
Article Google Scholar
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Google Scholar
Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124-129 (1971)
Google Scholar
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Google Scholar
Happy, S., Routray, A.: Automatic facial expression recognition using features of salient facial patches. IEEE Trans. Affect. Comput. 6(1), 1–12 (2014)
Article Google Scholar
Happy, S., Routray, A.: Robust facial expression classification using shape and appearance features. In: 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–5. IEEE (2015)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MATH Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Ilyas, C.M.A., Haque, M.A., Rehm, M., Nasrollahi, K., Moeslund, T.B.: Facial expression recognition for traumatic brain injured patients. In: International Conference on Computer Vision Theory and Applications, pp. 522–530. SCITEPRESS Digital Library (2018)
Google Scholar
Jeong, D., Kim, B.G., Dong, S.Y.: Deep joint spatiotemporal network (DJSTN) for efficient facial expression recognition. Sensors 20(7), 1936 (2020)
Article Google Scholar
Jung, H., Lee, S., Yim, J., Park, S., Kim, J.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2983–2991 (2015)
Google Scholar
Khowaja, S.A., Dahri, K., Kumbhar, M.A., Soomro, A.M.: Facial expression recognition using two-tier classification and its application to smart home automation system. In: 2015 International Conference on Emerging Technologies (ICET), pp. 1–6. IEEE (2015)
Google Scholar
Kim, B.K., Lee, H., Roh, J., Lee, S.Y.: Hierarchical committee of deep CNNs with exponentially-weighted decision fusion for static facial expression recognition. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 427–434 (2015)
Google Scholar
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Li, S., Deng, W.: Deep facial expression recognition: a survey. In: IEEE Transactions on Affective Computing (2020)
Google Scholar
Liang, D., Liang, H., Yu, Z., Zhang, Y.: Deep convolutional BiLSTM fusion network for facial expression recognition. Vis. Comput. 36(3), 499–508 (2020)
Article Google Scholar
Liu, K., Zhang, M., Pan, Z.: Facial expression recognition with CNN ensemble. In: 2016 International Conference on Cyberworlds (CW), pp. 163–166. IEEE (2016)
Google Scholar
Liu, P., Han, S., Meng, Z., Tong, Y.: Facial expression recognition via a boosted deep belief network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1805–1812 (2014)
Google Scholar
Liu, Y., Wang, J., Li, P.: A feature point tracking method based on the combination of SIFT algorithm and KLT matching algorithm. J. Astronautics 32(7), 1618–1625 (2011)
Google Scholar
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-workshops, pp. 94–101. IEEE (2010)
Google Scholar
Majumder, A., Behera, L., Subramanian, V.K.: Automatic facial expression recognition system using deep network-based data fusion. IEEE Trans. Cybern. 48(1), 103–114 (2016)
Article Google Scholar
Matsumoto, D.: More evidence for the universality of a contempt expression. Motiv. Emot. 16(4), 363–368 (1992)
Article Google Scholar
Minaee, S., Minaei, M., Abdolrashidi, A.: Deep-emotion: facial expression recognition using attentional convolutional network. Sensors 21(9), 3046 (2021)
Article Google Scholar
Miyoshi, R., Nagata, N., Hashimoto, M.: Enhanced convolutional LSTM with spatial and temporal skip connections and temporal gates for facial expression recognition from video. Neural Comput. Appl. 33(13), 7381–7392 (2021)
Article Google Scholar
Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10. IEEE (2016)
Google Scholar
Pan, X.: Fusing hog and convolutional neural network spatial-temporal features for video-based facial expression recognition. IET Image Proc. 14(1), 176–182 (2020)
Article Google Scholar
Pan, X., Ying, G., Chen, G., Li, H., Li, W.: A deep spatial and temporal aggregation framework for video-based facial expression recognition. IEEE Access 7, 48807–48815 (2019)
Article Google Scholar
Pei, W., Dibeklioğlu, H., Baltrušaitis, T., Tax, D.M.: Attended end-to-end architecture for age estimation from facial expression videos. IEEE Trans. Image Process. 29, 1972–1984 (2019)
Article MATH Google Scholar
Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Article Google Scholar
Sun, W., Zhao, H., Jin, Z.: A visual attention based ROI detection method for facial expression recognition. Neurocomputing 296, 12–22 (2018)
Article Google Scholar
Yang, H., Ciftci, U., Yin, L.: Facial expression recognition by de-expression residue learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2168–2177 (2018)
Google Scholar
Yu, Z., Liu, Q., Liu, G.: Deeper cascaded peak-piloted network for weak expression recognition. Vis. Comput. 34(12), 1691–1699 (2018)
Article Google Scholar
Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017)
Article MATH Google Scholar
Zhang, P., Liu, Y., Hao, Y., Liu, J.: Deep facial expression recognition algorithm combining channel attention. In: 2021 4th International Conference on Artificial Intelligence and Pattern Recognition, pp. 260–265 (2021)
Google Scholar
Zhao, G., Huang, X., Taini, M., Li, S.Z., PietikäInen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011)
Article Google Scholar
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007)
Article Google Scholar
Zhao, X., et al.: Peak-piloted deep network for facial expression recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 425–442. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_27
Chapter Google Scholar
Zhu, X., He, Z., Zhao, L., Dai, Z., Yang, Q.: A cascade attention based facial expression recognition network by fusing multi-scale spatio-temporal features. Sensors 22(4), 1350 (2022)
Article Google Scholar

Download references

Acknowledgements

This work is supported by Shandong Provincial Project of Graduate Education Quality Improvement (No. SDYJG21104, No. SDYJG19171, No. SDYY18058), the OMO Course Group “Advanced Computer Networks” of Shandong Normal University, the Teaching Team Project of Shandong Normal University, Teaching Research Project of Shandong Normal University (2018Z29), Provincial Research Project of Education and Teaching (No.2020JXY012), the Natural Science Foundation of Shandong Province (No. ZR2020LZH008, ZR2021MF118, ZR2019MF071).

Author information

Authors and Affiliations

School of Information Science and Engineering, Shandong Normal University, Jinan, China
Shuqin Li, Xiangwei Zheng & Xuanchi Chen
Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, China
Shuqin Li, Xiangwei Zheng & Xuanchi Chen
Internet Diagnosis and Treatment Center, Taian City Central Hospital, Taian, China
Xia Zhang
Shandong Normal University Library, Shandong Normal University, Jinan, China
Wei Li

Authors

Shuqin Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiangwei Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xia Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuanchi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xiangwei Zheng or Wei Li .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Xi’an Jiaotong-Liverpool University, Suzhou, China
Xinheng Wang
Zhejiang University City College, Hangzhou, China
Wei Wei
London South Bank University, London, UK
Tasos Dagiuklas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, S., Zheng, X., Zhang, X., Chen, X., Li, W. (2022). Facial Expression Recognition Based on Deep Spatio-Temporal Attention Network. In: Gao, H., Wang, X., Wei, W., Dagiuklas, T. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 461. Springer, Cham. https://doi.org/10.1007/978-3-031-24386-8_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-24386-8_28
Published: 25 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24385-1
Online ISBN: 978-3-031-24386-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics