Video Based Emotion Recognition Using CNN and BRNN

Cai, Youyi; Zheng, Wenming; Zhang, Tong; Li, Qiang; Cui, Zhen; Ye, Jiayin

doi:10.1007/978-981-10-3005-5_56

Youyi Cai¹⁶,
Wenming Zheng¹⁶,
Tong Zhang¹⁶,
Qiang Li¹⁶,
Zhen Cui¹⁶ &
…
Jiayin Ye¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

Chinese Conference on Pattern Recognition

2866 Accesses
10 Citations

Abstract

Video-based Emotion recognition is a rather challenging computer vision task. It not only needs to model spatial information of each image frame, but also requires considering temporal contextual correlations among sequential frames. For this purpose, we propose a hierarchical deep network architecture to extract high-level spatial-temporal features. In this architecture, two classic deep neural networks, convolutional neutral networks (CNN) and bi-directional recurrent neutral networks (BRNN), are employed to respectively capture facial textural characteristics in spatial domain and dynamic emotion changes in temporal domain. We endeavor to coordinate the two networks by optimizing each of them, so as to boost the performance of the emotion recognition. In the challenging competition, our method achieves a promising performance compared with the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Emotion Recognition with Spatial Attention and Temporal Softmax Pooling

Video-Based Emotion Estimation Using Deep Neural Networks: A Comparative Study

Lightweight Attention Based Deep CNN Framework for Human Facial Emotion Detection from Video Sequences

Article 20 December 2024

References

Zheng, W., Zhou, X., Xin, M.: Color facial expression recognition based on color local features. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1528–1532 (2015)
Google Scholar
Zheng, W.: Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans. Affect. Comput. 5(1), 71–85 (2014)
Article Google Scholar
Zheng, W., Tang, H., Lin, Z., Huang, T.S.: Emotion recognition from arbitrary view facial images. In: Maragos, P., Paragios, N., Daniilidis, K. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 490–503. Springer, Heidelberg (2010)
Chapter Google Scholar
Zheng, W., Tang, H., Lin, Z., et al.: A novel approach to expression recognition from non-frontal face images. In: IEEE 12th International Conference on Computer Vision, pp. 1901–1908 (2009)
Google Scholar
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an appli-cation to facial expressions. IEEE Trans. Pattern Analy. Mach. Intell. 29(6), 915–928 (2007)
Article Google Scholar
Klaser, M., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British Machine Vision Conference. British Machine Vision Association, vol. 275, pp. 1–10 (2008)
Google Scholar
Jain, S., Hu, C., Aggarwal, J.: Facial expression recognition with temporal modeling of shapes. In: ICCV Workshops, pp. 1642–1649 (2011)
Google Scholar
Wang, Z., Wang, S., Ji, Q.: Capturing complex spatio-temporal relations among facial muscles for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3422–3429 (2013)
Google Scholar
Liu, M., Li, S., Shan, S., Wang, R., Chen, X.: Deeply learning deformable facial action parts model for dynamic expression analysis. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 143–157. Springer, Heidelberg (2015)
Google Scholar
Wöllmer, M., Kaiser, M., Eyben, F., et al.: LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)
Article Google Scholar
Jung, H., Lee, S., Yim, J., et al.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2983–2991 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neu-ral networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Ouyang, W., Luo, P., Zeng, X., et al.: Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. arXiv preprint arXiv:1409.3505 (2014)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Zhang, T., Zheng, W., Cui, Z., et al.: A deep neural network driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimed. 99, 1 (2016)
Google Scholar
Kahou, S., Pal, C., Bouthillier, X., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 2013 ACM on International Conference on Multimodal Interaction, pp. 543–550 (2013)
Google Scholar
Dhall, A., Ramana, M., Goecke, R., et al.: Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on International Con-ference on Multimodal Interaction, pp. 423–426 (2015)
Google Scholar
Kahou, S., Bouthillier, X., Lamblin, P., et al.: EmoNets: multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Inter. 10, 1–13 (2015)
Google Scholar
Parkhi, O., Vedaldi, A., Zisserman, A.: Deep face recognition. Br. Mach. Vis. Conf. 1(3), 6 (2015)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898 (2014)
Google Scholar
Zuo, Z., Shuai, B., Wang, G., et al.: Convolutional recurrent neural networks: learning spatial dependencies for image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 18–26 (2015)
Google Scholar
Jaeger, H.: Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD-Forschungszentrum Informationstechnik (2002)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Li, Y., Tao, J., Schuller, B., Shan, S., Jiang, D., Jia, J.: MEC 2016: the multimodal emotion recognition challenge of CCPR 2016. In: Chinese Conference on Pattern Recognition (CCPR), Chengdu, China (2016)
Google Scholar
Ji, S., Xu, W., Yang, M., et al.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Analy. Mach. Intell. 35(1), 221–231 (2013)
Article MathSciNet Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the International Conference on Machine Learning, pp. 1764–1772 (2014)
Google Scholar
Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4041–4049 (2015)
Google Scholar
Sutskever, I.: Training recurrent neural networks. University of Toronto (2013)
Google Scholar
Cuéllar, M.P., Delgado, M., Pegalajar, M.C.: An application of non-linear programming to train recurrent neural networks in time series prediction problems. In: Chen, C.S., Cordeiro, J., Filipe, J., Seruca, I. (eds.) Enterprise Information Systems VII, pp. 95–102. Springer, Heidelberg (2007)
Google Scholar
Gross, R., Matthews, I., Cohn, J., et al.: Multi-pie. Image Vis. Comput. 28(5), 807–813 (2010)
Article Google Scholar

Download references

Acknowledgement

This work was supported by the National Basic Research Program of China under Grant 2015CB351704, the National Natural Science Foundation of China (NSFC) under Grants 61231002 and 61572009, the Natural Science Foundation of Jiangsu Province under Grant BK20130020.

Author information

Authors and Affiliations

Key Laboratory of Child Development and Learning Science, Ministry of Education, Research Center for Learning Science, Southeast University, Nanjing, 210096, China
Youyi Cai, Wenming Zheng, Tong Zhang, Qiang Li, Zhen Cui & Jiayin Ye

Authors

Youyi Cai
View author publications
You can also search for this author in PubMed Google Scholar
Wenming Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Tong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Cui
View author publications
You can also search for this author in PubMed Google Scholar
Jiayin Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenming Zheng .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China
Xuelong Li
Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
Xilin Chen
Tsinghua University , Beijing, China
Jie Zhou
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
University of Electronic Science and Technology, Chengdu, Sichuan, China
Hong Cheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, Y., Zheng, W., Zhang, T., Li, Q., Cui, Z., Ye, J. (2016). Video Based Emotion Recognition Using CNN and BRNN. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_56

Download citation

DOI: https://doi.org/10.1007/978-981-10-3005-5_56
Published: 22 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics