Abstract
Video-based Emotion recognition is a rather challenging computer vision task. It not only needs to model spatial information of each image frame, but also requires considering temporal contextual correlations among sequential frames. For this purpose, we propose a hierarchical deep network architecture to extract high-level spatial-temporal features. In this architecture, two classic deep neural networks, convolutional neutral networks (CNN) and bi-directional recurrent neutral networks (BRNN), are employed to respectively capture facial textural characteristics in spatial domain and dynamic emotion changes in temporal domain. We endeavor to coordinate the two networks by optimizing each of them, so as to boost the performance of the emotion recognition. In the challenging competition, our method achieves a promising performance compared with the baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zheng, W., Zhou, X., Xin, M.: Color facial expression recognition based on color local features. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1528–1532 (2015)
Zheng, W.: Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans. Affect. Comput. 5(1), 71–85 (2014)
Zheng, W., Tang, H., Lin, Z., Huang, T.S.: Emotion recognition from arbitrary view facial images. In: Maragos, P., Paragios, N., Daniilidis, K. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 490–503. Springer, Heidelberg (2010)
Zheng, W., Tang, H., Lin, Z., et al.: A novel approach to expression recognition from non-frontal face images. In: IEEE 12th International Conference on Computer Vision, pp. 1901–1908 (2009)
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an appli-cation to facial expressions. IEEE Trans. Pattern Analy. Mach. Intell. 29(6), 915–928 (2007)
Klaser, M., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British Machine Vision Conference. British Machine Vision Association, vol. 275, pp. 1–10 (2008)
Jain, S., Hu, C., Aggarwal, J.: Facial expression recognition with temporal modeling of shapes. In: ICCV Workshops, pp. 1642–1649 (2011)
Wang, Z., Wang, S., Ji, Q.: Capturing complex spatio-temporal relations among facial muscles for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3422–3429 (2013)
Liu, M., Li, S., Shan, S., Wang, R., Chen, X.: Deeply learning deformable facial action parts model for dynamic expression analysis. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 143–157. Springer, Heidelberg (2015)
Wöllmer, M., Kaiser, M., Eyben, F., et al.: LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)
Jung, H., Lee, S., Yim, J., et al.: Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2983–2991 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neu-ral networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ouyang, W., Luo, P., Zeng, X., et al.: Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. arXiv preprint arXiv:1409.3505 (2014)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Zhang, T., Zheng, W., Cui, Z., et al.: A deep neural network driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimed. 99, 1 (2016)
Kahou, S., Pal, C., Bouthillier, X., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 2013 ACM on International Conference on Multimodal Interaction, pp. 543–550 (2013)
Dhall, A., Ramana, M., Goecke, R., et al.: Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on International Con-ference on Multimodal Interaction, pp. 423–426 (2015)
Kahou, S., Bouthillier, X., Lamblin, P., et al.: EmoNets: multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Inter. 10, 1–13 (2015)
Parkhi, O., Vedaldi, A., Zisserman, A.: Deep face recognition. Br. Mach. Vis. Conf. 1(3), 6 (2015)
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898 (2014)
Zuo, Z., Shuai, B., Wang, G., et al.: Convolutional recurrent neural networks: learning spatial dependencies for image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 18–26 (2015)
Jaeger, H.: Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD-Forschungszentrum Informationstechnik (2002)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Li, Y., Tao, J., Schuller, B., Shan, S., Jiang, D., Jia, J.: MEC 2016: the multimodal emotion recognition challenge of CCPR 2016. In: Chinese Conference on Pattern Recognition (CCPR), Chengdu, China (2016)
Ji, S., Xu, W., Yang, M., et al.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Analy. Mach. Intell. 35(1), 221–231 (2013)
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the International Conference on Machine Learning, pp. 1764–1772 (2014)
Veeriah, V., Zhuang, N., Qi, G.: Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4041–4049 (2015)
Sutskever, I.: Training recurrent neural networks. University of Toronto (2013)
Cuéllar, M.P., Delgado, M., Pegalajar, M.C.: An application of non-linear programming to train recurrent neural networks in time series prediction problems. In: Chen, C.S., Cordeiro, J., Filipe, J., Seruca, I. (eds.) Enterprise Information Systems VII, pp. 95–102. Springer, Heidelberg (2007)
Gross, R., Matthews, I., Cohn, J., et al.: Multi-pie. Image Vis. Comput. 28(5), 807–813 (2010)
Acknowledgement
This work was supported by the National Basic Research Program of China under Grant 2015CB351704, the National Natural Science Foundation of China (NSFC) under Grants 61231002 and 61572009, the Natural Science Foundation of Jiangsu Province under Grant BK20130020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Cai, Y., Zheng, W., Zhang, T., Li, Q., Cui, Z., Ye, J. (2016). Video Based Emotion Recognition Using CNN and BRNN. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_56
Download citation
DOI: https://doi.org/10.1007/978-981-10-3005-5_56
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)