Skip to main content
Log in

RES-CapsNet: an improved capsule network for micro-expression recognition

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Micro-expression is a type of facial expression that reveals the deepest feeling held within the human heart. Despite the substantial improvement that has been achieved, micro-expression recognition remains a significant challenge considering its low intensity and short duration. In this paper, we investigate the recognition of micro-expression using deep learning techniques and present the RES-CapsNet, which is an improved capsule network that employs Res2Net as the backbone to extract multi-level and multi-scale characteristics. Furthermore, RES-CapsNet adds a squeeze-excitation (SE) block to the primary capsule layer (PrimaryCaps). Benefiting from a SE block, the valuable micro-expression features are highlighted and the useless ones are suppressed. In addition, between the first convolutional layer and the PrimaryCaps in RES-CapsNet, we introduce an effective channel attention (ECA) module that simply includes a few parameters while dramatically improving the performance. The proposed architecture initially obtains apex frames from the micro-expression sequence to capture the most distinct facial muscle movements and then feeds the pre-processed images into RES-CapsNet for further feature extraction and classification. The Leave-One-Subject-Out (LOSO) cross-validation strategy is implemented on three prevalent spontaneous micro-expression databases (i.e., CASME II, SMIC, and SAMM) to assess the feasibility of our RES-CapsNet. Extensive experiments demonstrate that our RES-CapsNet describes considerable details of micro-expression effectively and achieves superiorly higher performance than the baseline CapsuleNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets used in our paper (CASME II, SAMM, and SMIC) are publicly available.

References

  1. Wu, Q., Fu, X.: Micro-expression and its applications. Adv. Psychol. Sci. 18(09), 1359 (2010)

    Google Scholar 

  2. Xie, Z., Yu, X., Niu, J., Li, Y.: Facial microexpression recognition based on adaptive key frame representation. J. Electron. Imaging 28, 1 (2019). https://doi.org/10.1117/1.JEI.28.3.033015

    Article  Google Scholar 

  3. Zong, Y., Zheng, W., Huang, X., Shi, J., Cui, Z., Zhao, G.: Domain regeneration for cross-database micro-expression recognition. IEEE Trans. Image Process. 27, 2484–2498 (2018). https://doi.org/10.1109/TIP.2018.2797479

    Article  MathSciNet  MATH  Google Scholar 

  4. Peng, M., Wang, C., Chen, T., Liu, G., Fu, X.: Dual temporal scale convolutional neural network for micro-expression recognition. Front. Psychol. 8, 1745 (2017). https://doi.org/10.3389/fpsyg.2017.01745

    Article  Google Scholar 

  5. Peng, M., Wu, Z., Zhang, Z., Chen, T.: From macro to micro expression recognition: deep learning on small datasets using transfer learning. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018). pp. 657–661 (2018)

  6. Khor, H.-Q., See, J., Phan, R.C.W., Lin, W.: Enriched long-term recurrent convolutional network for facial micro-expression recognition. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018). pp. 667–674 (2018)

  7. Xie, Z., Shi, L., Cheng, S., Fan, J., Zhan, H.: Micro-expression recognition based on deep capsule adversarial domain adaptation network. J. Electron. Imaging (2022). https://doi.org/10.1117/1.JEI.31.1.013021

    Article  Google Scholar 

  8. G.E. Hinton, S. Sabour, N. Frosst: Matrix capsules with EM routing. International Conference on Learning Representations (2018)

  9. S. Sabour, N. Frosst, G.E. Hinton: Dynamic routing between capsules. Adv. Neural. Inf. Process. Syst. 30 (2017)

  10. Quang, N. van, Chun, J., Tokuyama, T.: CapsuleNet for Micro-Expression Recognition. In: 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG 2019). pp. 1–7 (2019)

  11. Gao, S.-H., Cheng, M.-M., Zhao, K., Zhang, X.-Y., Yang, M.-H., Torr, P.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43, 652–662 (2021). https://doi.org/10.1109/TPAMI.2019.2938758

    Article  Google Scholar 

  12. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2011–2023 (2020). https://doi.org/10.1109/TPAMI.2019.2913372

    Article  Google Scholar 

  13. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks (2019)

  14. Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29, 915–928 (2007). https://doi.org/10.1109/TPAMI.2007.1110

    Article  Google Scholar 

  15. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002). https://doi.org/10.1109/TPAMI.2002.1017623

    Article  MATH  Google Scholar 

  16. Wang, Y., See, J., Phan, R., Oh, Y.-H.: LBP with six intersection points: reducing redundant information in LBP-TOP for micro-expression recognition. Presented at the May (2015)

  17. Wang, Y., See, J., Phan, R., Oh, Y.-H.: Efficient spatio-temporal local binary patterns for spontaneous facial micro-expression recognition. PLoS One 10, e0124674 (2015). https://doi.org/10.1371/journal.pone.0124674

    Article  Google Scholar 

  18. Liu, Y.-J., Zhang, J.-K., Yan, W.-J., Wang, S.-J., Zhao, G., Fu, X.: A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Trans. Affect. Comput. 7, 299–310 (2016). https://doi.org/10.1109/TAFFC.2015.2485205

    Article  Google Scholar 

  19. Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1932–1939 (2009)

  20. Liu, Y.-J., Li, B.-J., Lai, Y.-K.: Sparse MDMO: learning a discriminative feature for micro-expression recognition. IEEE Trans. Affect. Comput. 12, 254–261 (2021). https://doi.org/10.1109/TAFFC.2018.2854166

    Article  Google Scholar 

  21. Liong, S.-T., See, J., Wong, K., Phan, R.C.-W.: Less is more: micro-expression recognition from video using apex frame. Signal Process. Image Commun. 62, 82–92 (2018). https://doi.org/10.1016/j.image.2017.11.006

    Article  Google Scholar 

  22. Kim, D., Baddar, W., Ro, Y.: Micro-Expression Recognition with Expression-State Constrained Spatio-Temporal Feature Representations. Presented at the May (2016)

  23. Wang, S.-J., Li, B.-J., Liu, Y.-J., Yan, W.-J., Ou, X., Huang, X., Xu, F., Fu, X.: Micro-expression recognition with small sample size by transferring long-term convolutional neural network. Neurocomputing (2018). https://doi.org/10.1016/j.neucom.2018.05.107

    Article  Google Scholar 

  24. Gan, Y.S., Liong, S.-T., Yau, W.-C., Huang, Y.-C., Tan, L.-K.: OFF-ApexNet on micro-expression recognition system. Signal Process. Image Commun. 74, 129–139 (2019). https://doi.org/10.1016/j.image.2019.02.005

    Article  Google Scholar 

  25. Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: International conference on artificial neural networks. pp. 44–51. Springer (2011)

  26. Gagana, B., Athri, H.A.U., Natarajan, S.: Activation Function Optimizations for Capsule Networks. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI). pp. 1172–1178 (2018)

  27. Yin, J., Li, S., Zhu, H., Luo, X.: Hyperspectral image classification using CapsNet with well-initialized shallow layers. IEEE Geosci. Remote Sens. Lett. 16, 1095–1099 (2019). https://doi.org/10.1109/LGRS.2019.2891076

    Article  Google Scholar 

  28. Valstar, M., Pantic, M.: Fully automatic facial action unit detection and temporal analysis. In: 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06). p. 149 (2006)

  29. Borza, D., Danescu, R., Itu, R., Darabant, A.: High-speed video system for micro-expression detection and recognition. Sensors. 17, 2913 (2017). https://doi.org/10.3390/s17122913

    Article  Google Scholar 

  30. He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016)

  31. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated Residual Transformations for Deep Neural Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5987–5995 (2017)

  32. Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep Layer Aggregation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2403–2412 (2018)

  33. Yan, W.-J., Li, X., Wang, S.-J., Zhao, G., Liu, Y.-J., Chen, Y.-H., Fu, X.: CASME II: an improved spontaneous micro-expression database and the baseline evaluation. PLoS One 9, 1–8 (2014). https://doi.org/10.1371/journal.pone.0086041

    Article  Google Scholar 

  34. Li, X., Pfister, T., Huang, X., Zhao, G., Pietikäinen, M.: A Spontaneous Micro-expression Database: Inducement, collection and baseline. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). pp. 1–6 (2013)

  35. Davison, A.K., Lansley, C., Costen, N., Tan, K., Yap, M.H.: SAMM: A Spontaneous Micro-Facial Movement Dataset. IEEE Trans Affect Comput. 9, 116–129 (2018). https://doi.org/10.1109/TAFFC.2016.2573832

    Article  Google Scholar 

  36. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Presented at the May (2012)

  37. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1–9 (2015)

  38. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. (2014)

  39. Liong, S.-T., Gan, Y.S., See, J., Khor, H.-Q., Huang, Y.-C.: Shallow Triple Stream Three-dimensional CNN (STSTNet) for Micro-expression Recognition. In: 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG 2019). pp. 1–5 (2019)

  40. Zhou, L., Mao, Q., Xue, L.: Dual-Inception Network for Cross-Database Micro-Expression Recognition. In: 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG 2019). pp. 1–5 (2019)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62276118.

Funding

The National Natural Science Foundation of China, 62276118, 62276118, 62276118.

Author information

Authors and Affiliations

Authors

Contributions

XS: Supervision, Writing—review and editing, Investigation. JL: Writing—original draft, Software, Methodology. LS: Conceptualization, Validation. SH: Data curation, Resources, Validation.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shu, X., Li, J., Shi, L. et al. RES-CapsNet: an improved capsule network for micro-expression recognition. Multimedia Systems 29, 1593–1601 (2023). https://doi.org/10.1007/s00530-023-01068-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-023-01068-z

Keywords

Navigation