Abstract
In recent years, a lot of research has been carried out in face detection and facial expression recognition. Very few of them are capable of achieving these at real time with a very high accuracy. In this paper we present a real time end to end, single step face and facial expression recognition technique which performs at a speed of more than 10 fps (frames per second). We use an end-to-end deep learning approach for localization and expression classification. On the CK+ [1] dataset we get a 10-fold validation accuracy of 94.8% on 640 * 480 images. We have also created a webcam interface, which classifies the emotion of a person at 10 fps, which proves our claim that facial expression recognition has approached real time speed with very decent accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specific expression. In: Proceedings of the 3rd IEEE Workshop on CVPR for Human Communication Behaviour Analysis, San Francisco, CA, USA (2010)
Video and image based emotion recognition challenges in the wild: EmotiW 2015. In: ACM International Conference on Multimodal Interaction (ICMI) (2015)
Audio/visual emotion challenge and workshop: AVEC 2016. In: Proceedings of ACM Multimedia (2016)
Tian, Y.-L., Kanade, T., Cohn, J.: Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 97–115 (2001)
Zhang, L., Tjondronegoro, D., Chandran, V.: Representation of facial expression categories in continuous arousal-valence space: feature and correlation. Image Vis. Comput. 32(12), 1067–1079 (2014)
Liu, M., Wang, R., Li, S., Shan, S., Huang, Z., Chen, X.: Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th International Conference on Multimodal Interaction, ICMI 2014, pp. 494–501. ACM, New York (2014)
Sun, B., Li, L., Zuo, T., Chen, Y., Zhou, G., Wu, X.: Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In: Proceedings of the 16th International Conference on Multimodal Interaction, ICMI 2014, pp. 481–486. ACM, New York (2014)
Ng, H.-W., Nguyen, V.D., Vonikakis, V., Winkler, S.: Deep learning for emotion recognition on small datasets using transfer learning. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 443–449. ACM (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2007 (VOC 2007) results (2007)
Geiger, A., Lenz, P., Ortasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ILSVRC (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Challenges in Representation Learning: Facial Expression Recognition Challenge. Kaggle Inc.
https://www.dropbox.com/s/xtr4yd4i5e0vw8g/iccv15_tutorial_training_rbg.pdf?dl=0
Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474. ACM (2015)
Jung, H., Lee, S., Park, S., Lee, I., Ahn, C., Kim, J.: Deep temporal appearance-geometry network for facial expression recognition (2015). arXiv:1503.01532v1
Ghimire, D., Lee, H., Li, Z.-N., Heong, S., Park, S.H., Choi, H.S.: Recognition of facial expressions based on tracking and selection of discriminative geometric features. Int. J. Multimedia Ubiquit. Eng. 10(3), 35–44 (2015)
Korattikara, A., Rathod, V., Murphy, K., Welling, M.: Bayesian dark knowledge. In: NIPS (2015)
Yu, Z., Zhang, C.: Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the 2015 ACM International Conference Multimodal Interaction, pp. 435–442. ACM
Kim, B.-K., Lee, H., Roh, J., Lee, S.-Y.: Hierarchical committee of deep CNNs with exponentially-weighted decision fusion for static facial expression recognition. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI 2015, pp. 427–434. ACM, New York (2015)
Russakovsky*, O., Deng*, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015). (* = equal contribution)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Reddy, B., Kim, YH., Yun, S., Jang, J., Hong, S. (2017). End to End Deep Learning for Single Step Real-Time Facial Expression Recognition. In: Nasrollahi, K., et al. Video Analytics. Face and Facial Expression Recognition and Audience Measurement. VAAM FFER 2016 2016. Lecture Notes in Computer Science(), vol 10165. Springer, Cham. https://doi.org/10.1007/978-3-319-56687-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-56687-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56686-3
Online ISBN: 978-3-319-56687-0
eBook Packages: Computer ScienceComputer Science (R0)