Two-Stream Interactive Memory Network for Video Facial Expression Recognition

Chen, Lingyu; Ouyang, Yong; Xu, Ranyi; Sun, Sisi; Zeng, Yawen

doi:10.1007/978-3-031-15934-3_25

Lingyu Chen¹²,
Yong Ouyang¹²,
Ranyi Xu¹³,
Sisi Sun¹⁴ &
…
Yawen Zeng¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13531))

Included in the following conference series:

International Conference on Artificial Neural Networks

1762 Accesses

Abstract

The task of video facial expression recognition is widely applied in human-computer, psychology interaction and other fields. Existing methods are generally based on LSTM or CNN, but these frameworks are under-developed for the following two reasons. 1) Some own small memory capacity, and their memory storage encoded by hidden states cannot precisely remember past changes; 2) Others only focus on the local appearance of faces. Therefore, how to exploit longer dynamic facial changes and refine local information in video is a non-trivial work.

To solve the above problems, a two-stream interactive memory network based on channel/spatial attention(TM-CSA) is proposed in this paper. Specifically, a channel attention module attempts to extract more distinctive features among different channels, and a spatial attention module encodes the pixel-level context of the entire image. In this way, a interactive memory module of TM-CSA mines the interaction and correlation within and between images. Correspondingly, the TM-CSA has ability to remember enough past facts and reduce information redundancy. The experimental results tested on the three public datasets, JAFFE, CK+ and ImaSeDS show our TM-CSA has better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

An, F., Liu, Z.: Facial expression recognition algorithm based on parameter adaptive initialization of cnn and lstm. The Visual Computer (2019)
Google Scholar
Chen, L., Ouyang, y., Zeng, Y., Li, Y.: Dynamic facial expression recognition model based on bilstm-attention. In: ICCSE (2020)
Google Scholar
Cugu, I., Sener, E., Akbas, E.: Microexpnet: An extremely small and fast model for expression recognition from face images. In: IPTA (2019)
Google Scholar
Deng, L., Wang, Q., Yuan, D.: Dynamic facial expression recognition based on deep learning. In: 14th International Conference on Computer Science & Education, ICCSE 2019, Toronto, ON, Canada, 19–21 August 2019, pp. 32–37. IEEE (2019)
Google Scholar
Ekman, P., Friesen, W.V.: A new pan-cultural facial expression of emotion. Motiv. Emot. 10(2), 159–168 (1986)
Article Google Scholar
Eskil, M.T., Benli, K.S.: Facial expression recognition based on anatomy. Comput. Vis. Image Underst. 119, 1–14 (2014). https://doi.org/10.1016/j.cviu.2013.11.002
Benitez-Quiroz, C.F., Srinivasan, R., Martinez, A.M.: Discriminant functional learning of color features for the recognition of facial action units and their intensities. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2835–2845 (2018)
Article Google Scholar
Fan, X.: Tjahjadi, Tardi: a spatial-temporal framework based on histogram of gradients and optical flow for facial expression recognition in video sequences. Pattern Recogn. J. Pattern Recogn. Soc. 48(11), 3407–3416 (2015)
Article Google Scholar
Goyani, M., Patel, N.: Template matching and machine learning-based robust facial expression recognition system using multi-level haar wavelet. Int. J. Comput. Appli. 42, 1–12 (2017)
Google Scholar
Hu, M., Wang, H., Wang, X., Yang, J., Wang, R.: Video facial emotion recognition based on local enhanced motion history image and CNN-CTSLSTM networks. J. Vis. Commun. Image Represent. 59, 176–185 (2019)
Article Google Scholar
Jain, S., Hu, C., Aggarwal, J.K.: Facial expression recognition with temporal modeling of shapes. In: ICCV (2011)
Google Scholar
Jie, H., Li, S., Gang, S., Albanie, S.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42, PP(99) (2017)
Google Scholar
Lei, J., Liu, Z., Zou, Z., Li, T., Xu, J., Feng, Z., Liang, R.: Facial expression recognition by expression-specific representation swapping (2021)
Google Scholar
Li, T.S., Kuo, P., Tsai, T., Luan, P.: CNN and LSTM based facial expression analysis model for a humanoid robot. IEEE Access 7, 93998–94011 (2019)
Article Google Scholar
Li, Z., Wu, S., Xiao, G.: Facial expression recognition by multi-scale cnn with regularized center loss. In: ICPR (2018)
Google Scholar
Liu, C., Hirota, K., Ma, J., Jia, Z., Dai, Y.: Facial expression recognition using hybrid features of pixel and geometry. IEEE Access PP(99), 1–1 (2021)
Google Scholar
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Matthews, I.: The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In: CVPR Workshops (2010)
Google Scholar
Lyons, M.J., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor wavelets (ivc special issue) (2020)
Google Scholar
Meng, Z., Ping, L., Jie, C., Han, S., Yan, T.: Identity-aware convolutional neural network for facial expression recognition. In: IEEE International Conference on Automatic Face and Gesture Recognition (2017)
Google Scholar
Miyoshi, R., Nagata, N., Hashimoto, M.: Facial-expression recognition from video using enhanced convolutional lstm. In: 2019 Digital Image Computing: Techniques and Applications (DICTA) (2019)
Google Scholar
Munasinghe, M.I.N.P.: Facial expression recognition using facial landmarks and random forest classifier. In: 17th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2018, Singapore, 6–8 June 2018, pp. 423–427. IEEE Computer Society (2018)
Google Scholar
Nigam, S., Singh, R., Misra, A.K.: Efficient facial expression recognition using histogram of oriented gradients in wavelet domain. Multimedia Tools Appli. 77(21), 28725–28747 (2018). https://doi.org/10.1007/s11042-018-6040-3
Article Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: MM, pp. 357–360 (2007)
Google Scholar
Sun, W., Zhao, H., Jin, Z.: A visual attention based roi detection method for facial expression recognition. Neurocomputing 296, 12–22 (2018)
Article Google Scholar
Sun, X., Xia, P., Ren, F.: Multi-attention based deep neural network with hybrid features for dynamic sequential facial expression recognition. Neurocomputing 444, 378–389 (2020)
Article Google Scholar
Sun, Z., Hu, Z.P., Chiong, R., Wang, M., He, W.: Combining the kernel collaboration representation and deep subspace learning for facial expression recognition. J. Circuits Syst. Comput. 27(8), 1850121.1-1850121.16 (2018)
Google Scholar
Tang, D., Qin, B., Liu, T.: Aspect level sentiment classification with deep memory network (2016)
Google Scholar
Wang, S., Zheng, Z., Yin, S., Yang, J., Ji, Q.: A novel dynamic model capturing spatial and temporal patterns for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2082–2095 (2019)
Article Google Scholar
Wang, Y., Hui, Y., Stevens, B., Liu, H.: Dynamic facial expression recognition using local patch and lbp-top. In: International Conference on Human System Interactions (2015)
Google Scholar
Weston, J., Chopra, S., Bordes, A.: Memory networks. Eprint Arxiv (2014)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: Convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Xu, N., Mao, W., Chen, G.: Multi-interactive memory network for aspect based multimodal sentiment analysis. In: AAAI (2019)
Google Scholar
Yu, J., Bhanu, B.: Evolutionary feature synthesis for facial expression recognition. Pattern Recognit. Lett. 27(11), 1289–1298 (2006)
Article Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer (2016)
Google Scholar
Zeng, Y., Cao, D., Lu, S., Zhang, H., Xu, J., Zheng, Q.: Moment is important: language-based video moment retrieval via adversarial learning. ACM Trans. Multim. Comput. Commun. Appl. 18, 56:1–56:21 (2022)
Google Scholar
Zeng, Y., Cao, D., Wei, X., Liu, M., Zhao, Z., Qin, Z.: Multi-modal relational graph for cross-modal video moment retrieval. In: CVPR, pp. 2215–2224. IEEE (2021)
Google Scholar
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29, 915–928 (2007)
Article Google Scholar
Zhi, R., Wan, M.: Dynamic facial expression feature learning based on sparse rnn. In: ITAIC (2019)
Google Scholar

Download references

Acknowledgements

This paper is supported by National College Student Innovation and Entrepreneurship Training Program (S202010500049).

Author information

Authors and Affiliations

Hubei University of Technology, Wuhan, 430068, China
Lingyu Chen & Yong Ouyang
Hunan University, Changsha, 410012, China
Ranyi Xu
Minzu University of China, Beijing, 100081, China
Sisi Sun
Tencent Inc., Shenzhen, 518000, China
Yawen Zeng

Authors

Lingyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yong Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Ranyi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Sisi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yawen Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yong Ouyang or Yawen Zeng .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teeside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, L., Ouyang, Y., Xu, R., Sun, S., Zeng, Y. (2022). Two-Stream Interactive Memory Network for Video Facial Expression Recognition. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13531. Springer, Cham. https://doi.org/10.1007/978-3-031-15934-3_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-15934-3_25
Published: 15 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15933-6
Online ISBN: 978-3-031-15934-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Two-Stream Interactive Memory Network for Video Facial Expression Recognition