Skip to main content

One-Shot Only Real-Time Video Classification: A Case Study in Facial Emotion Recognition

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12489))

Abstract

Video classification is an important research field due to its applications ranging from human action recognition for video surveillance to emotion recognition for human-computer interaction. This paper proposes a new method called One-Shot Only (OSO) for real-time video classification with a case study in facial emotion recognition. Instead of using 3D convolutional neural networks (CNN) or multiple 2D CNNs with decision fusion as in the previous studies, the OSO method tackles video classification as a single image classification problem by spatially rearranging video frames using frame selection or clustering strategies to form a simple representative storyboard for spatio-temporal video information fusion. It uses a single 2D CNN for video classification and thus can be optimised end-to-end directly in terms of the classification accuracy. Experimental results show that the OSO method proposed in this paper outperformed multiple 2D CNNs with decision fusion by a large margin in terms of classification accuracy (by up to 13%) on the AFEW 7.0 dataset for video classification. It is also very fast, up to ten times faster than the commonly used 2D CNN architectures for video classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Kim, B.-K., Roh, J., Dong, S.-Y., Lee, S.-Y.: Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J. Multimodal User Interfaces 10(2), 173–189 (2016). https://doi.org/10.1007/s12193-015-0209-0

    Article  Google Scholar 

  2. Liu, C., Tang, T., Lv, K., Wang, M.: Multi-feature based emotion recognition for video clips. In: Proceedings of the ACM International Conference on Multimodal Interaction, pp. 630–634. ACM, Boulder (2018)

    Google Scholar 

  3. Lu, C., Zheng, W., Li, C., Tang, C., Liu, S., Yan, S., Zong, Y.: Multiple spatio-temporal feature learning for video-based emotion recognition in the wild. In: Proceedings of the ACM International Conference on Multimodal Interaction, pp. 646–652. ACM, Boulder (2018)

    Google Scholar 

  4. Knyazev, B., Shvetsov, R., Efremova, N., Kuharenko, A.: Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598 (2017)

  5. Bargal, S.A., Barsoum, E., Ferrer, C.C., Zhang, C.: Emotion recognition in the wild from videos using images. In: Proceedings of the ACM International Conference on Multimodal Interaction, pp. 433–436. ACM, Tokyo (2016)

    Google Scholar 

  6. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

    Google Scholar 

  7. Jing, L., Yang, X., Tian, Y.: Video you only look once: overall temporal convolutions for action recognition. J. Vis. Commun. Image Representation 52, 58–65 (2018)

    Article  Google Scholar 

  8. Samadiani, N., Huang, G., Cai, B., Luo, W., Chi, C.-H., Xiang, Y., He, J.: A review on automatic facial expression recognition systems assisted by multimodal sensor data. Sensors 19, 1863 (2019)

    Article  Google Scholar 

  9. Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17, 124–129 (1971)

    Article  Google Scholar 

  10. Kahou, S.E., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM International conference on multimodal interaction, pp. 543–550. ACM, Sydney (2013)

    Google Scholar 

  11. Dhall, A., Goecke, R., Joshi, J., Wagner, M., Gedeon, T.: Emotion recognition in the wild challenge 2013. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 509–516. ACM, Sydney (2013)

    Google Scholar 

  12. Sikka, K., Dykstra, K., Sathyanarayana, S., Littlewort, G., Bartlett, M.: Multiple kernel learning for emotion recognition in the wild. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 517–524. ACM, Sydney (2013)

    Google Scholar 

  13. Liu, M., Wang, R., Huang, Z., Shan, S., Chen, X.: Partial least squares regression on grassmannian manifold for emotion recognition. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 525–530. ACM, Sydney (2013)

    Google Scholar 

  14. Chen, J., Chen, Z., Chi, Z., Fu, H.: Facial expression recognition in video with multiple feature fusion. IEEE Trans. Affect. Comput. 9, 38–50 (2018)

    Article  Google Scholar 

  15. Dhall, A., Murthy, O.V.R., Goecke, R., Joshi, J., Gedeon, T.: Video and image based emotion recognition challenges in the wild: EmotiW 2015. In: Proceedings of the ACM on International Conference on Multimodal Interaction, pp. 423–426. ACM, Seattle (2015)

    Google Scholar 

  16. Yang, B., Cao, J., Ni, R., Zhang, Y.: Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access 6, 4630–4640 (2018)

    Article  Google Scholar 

  17. Doherty, A.R., Byrne, D., Smeaton, A.F., Jones, G.J.F., Hughes, M.: Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs. In: Proceedings of the International Conference on Content-based Image and Video Retrieval, pp. 259–268. ACM, Niagara Falls (2008)

    Google Scholar 

  18. Guo, S.M., Pan, Y.A., Liao, Y.C., Hsu, C.Y., Tsai, J.S.H., Chang, C.I.: A key frame selection-based facial expression recognition system. In: Proceedings of ICICIC 2006 Innovative Computing, Information and Control, pp. 341–344 (2006)

    Google Scholar 

  19. Zhang, Q., Yu, S.-P., Zhou, D.-S., Wei, X.-P.: An efficient method of key-frame extraction based on a cluster algorithm. J. Hum. Kinet. 39, 5–14 (2013)

    Article  Google Scholar 

  20. Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10, 18–31 (2019)

    Article  Google Scholar 

  21. Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed. 19, 34–41 (2012)

    Article  Google Scholar 

  22. Shi, J., Tomasi, C.: Good features to track. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)

    Google Scholar 

  23. Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical report, Carnegie Mellon University (1991)

    Google Scholar 

  24. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23, 1499–1503 (2016)

    Article  Google Scholar 

  25. Ouyang, X., et al.: Audio-visual emotion recognition using deep transfer learning and multiple temporal models. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 577–582. ACM, Glasgow (2017)

    Google Scholar 

  26. Fan, Y., Lu, X., Li, D., Liu, Y.: Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 445–450. ACM, Tokyo (2016)

    Google Scholar 

  27. Vielzeuf, V., Pateux, S., Jurie, F.: Temporal multimodal fusion for video emotion classification in the wild. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 569–576. ACM, Glasgow (2017)

    Google Scholar 

  28. Fan, Y., Lam, Jacqueline C.K., Li, Victor O.K.: Multi-region ensemble convolutional neural network for facial expression recognition. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11139, pp. 84–94. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01418-6_9

    Chapter  Google Scholar 

  29. Yan, J., et al.: Multi-clue fusion for emotion recognition in the wild. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 458–463. ACM, Tokyo (2016)

    Google Scholar 

  30. Ding, W., et al.: Audio and face video emotion recognition in the wild using deep neural networks and small datasets. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 506–513. ACM, Tokyo (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Arwa Basbrain or John Q. Gan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Basbrain, A., Gan, J.Q. (2020). One-Shot Only Real-Time Video Classification: A Case Study in Facial Emotion Recognition. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2020. IDEAL 2020. Lecture Notes in Computer Science(), vol 12489. Springer, Cham. https://doi.org/10.1007/978-3-030-62362-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62362-3_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62361-6

  • Online ISBN: 978-3-030-62362-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics