Skip to main content

Video Based Emotion Recognition Using CNN and BRNN

  • Conference paper
  • First Online:
Pattern Recognition (CCPR 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

Abstract

Video-based Emotion recognition is a rather challenging computer vision task. It not only needs to model spatial information of each image frame, but also requires considering temporal contextual correlations among sequential frames. For this purpose, we propose a hierarchical deep network architecture to extract high-level spatial-temporal features. In this architecture, two classic deep neural networks, convolutional neutral networks (CNN) and bi-directional recurrent neutral networks (BRNN), are employed to respectively capture facial textural characteristics in spatial domain and dynamic emotion changes in temporal domain. We endeavor to coordinate the two networks by optimizing each of them, so as to boost the performance of the emotion recognition. In the challenging competition, our method achieves a promising performance compared with the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Zheng, W., Zhou, X., Xin, M.: Color facial expression recognition based on color local features. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1528–1532 (2015)

    Google Scholar 

  2. Zheng, W.: Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans. Affect. Comput. 5(1), 71–85 (2014)

    Article  Google Scholar 

  3. Zheng, W., Tang, H., Lin, Z., Huang, T.S.: Emotion recognition from arbitrary view facial images. In: Maragos, P., Paragios, N., Daniilidis, K. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 490–503. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Zheng, W., Tang, H., Lin, Z., et al.: A novel approach to expression recognition from non-frontal face images. In: IEEE 12th International Conference on Computer Vision, pp. 1901–1908 (2009)

    Google Scholar 

  5. Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an appli-cation to facial expressions. IEEE Trans. Pattern Analy. Mach. Intell. 29(6), 915–928 (2007)

    Article  Google Scholar 

  6. Klaser, M., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British Machine Vision Conference. British Machine Vision Association, vol. 275, pp. 1–10 (2008)

    Google Scholar 

  7. Jain, S., Hu, C., Aggarwal, J.: Facial expression recognition with temporal modeling of shapes. In: ICCV Workshops, pp. 1642–1649 (2011)

    Google Scholar 

  8. Wang, Z., Wang, S., Ji, Q.: Capturing complex spatio-temporal relations among facial muscles for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3422–3429 (2013)

    Google Scholar 

  9. Liu, M., Li, S., Shan, S., Wang, R., Chen, X.: Deeply learning deformable facial action parts model for dynamic expression analysis. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 143–157. Springer, Heidelberg (2015)

    Google Scholar 

  10. Wöllmer, M., Kaiser, M., Eyben, F., et al.: LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)

    Article  Google Scholar 

  11. Jung, H., Lee, S., Yim, J., et al.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2983–2991 (2015)

    Google Scholar 

  12. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neu-ral networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  14. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  15. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  16. Ouyang, W., Luo, P., Zeng, X., et al.: Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. arXiv preprint arXiv:1409.3505 (2014)

  17. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  18. Zhang, T., Zheng, W., Cui, Z., et al.: A deep neural network driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimed. 99, 1 (2016)

    Google Scholar 

  19. Kahou, S., Pal, C., Bouthillier, X., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 2013 ACM on International Conference on Multimodal Interaction, pp. 543–550 (2013)

    Google Scholar 

  20. Dhall, A., Ramana, M., Goecke, R., et al.: Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on International Con-ference on Multimodal Interaction, pp. 423–426 (2015)

    Google Scholar 

  21. Kahou, S., Bouthillier, X., Lamblin, P., et al.: EmoNets: multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Inter. 10, 1–13 (2015)

    Google Scholar 

  22. Parkhi, O., Vedaldi, A., Zisserman, A.: Deep face recognition. Br. Mach. Vis. Conf. 1(3), 6 (2015)

    Google Scholar 

  23. Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898 (2014)

    Google Scholar 

  24. Zuo, Z., Shuai, B., Wang, G., et al.: Convolutional recurrent neural networks: learning spatial dependencies for image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 18–26 (2015)

    Google Scholar 

  25. Jaeger, H.: Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD-Forschungszentrum Informationstechnik (2002)

    Google Scholar 

  26. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  27. Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  28. Li, Y., Tao, J., Schuller, B., Shan, S., Jiang, D., Jia, J.: MEC 2016: the multimodal emotion recognition challenge of CCPR 2016. In: Chinese Conference on Pattern Recognition (CCPR), Chengdu, China (2016)

    Google Scholar 

  29. Ji, S., Xu, W., Yang, M., et al.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Analy. Mach. Intell. 35(1), 221–231 (2013)

    Article  MathSciNet  Google Scholar 

  30. Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the International Conference on Machine Learning, pp. 1764–1772 (2014)

    Google Scholar 

  31. Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4041–4049 (2015)

    Google Scholar 

  32. Sutskever, I.: Training recurrent neural networks. University of Toronto (2013)

    Google Scholar 

  33. Cuéllar, M.P., Delgado, M., Pegalajar, M.C.: An application of non-linear programming to train recurrent neural networks in time series prediction problems. In: Chen, C.S., Cordeiro, J., Filipe, J., Seruca, I. (eds.) Enterprise Information Systems VII, pp. 95–102. Springer, Heidelberg (2007)

    Google Scholar 

  34. Gross, R., Matthews, I., Cohn, J., et al.: Multi-pie. Image Vis. Comput. 28(5), 807–813 (2010)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by the National Basic Research Program of China under Grant 2015CB351704, the National Natural Science Foundation of China (NSFC) under Grants 61231002 and 61572009, the Natural Science Foundation of Jiangsu Province under Grant BK20130020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenming Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Cai, Y., Zheng, W., Zhang, T., Li, Q., Cui, Z., Ye, J. (2016). Video Based Emotion Recognition Using CNN and BRNN. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_56

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3005-5_56

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3004-8

  • Online ISBN: 978-981-10-3005-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics