Abstract
Many applications require action recognition skills, from human-machine interaction to intelligent video surveillance. Action recognition in video sequences cannot be based on simply processing raw color images or optical flow fields. Color images provide appearance information of moving objects, but lack motion features. They are also very sensitive to variations due to clothing and camera pose that badly affect the action recognition accuracy. In turn, raw optical flow measures instantaneous motion, not the overall dynamics of actions, and is sensitive to noise. More robust and meaningful motion features and classifiers are thus required for action recognition to be reliable. This paper proposes a new action recognition technique based on a deep convolutional neural network (CNN) fed with Histograms of Optical Flow Co-Occurrence (HOF-CO) motion features. HOF-CO is a robust motion representation previously proposed by the authors to encode the relative frequency of pairs of optical flow directions computed at each image pixel. Experimental results show that this approach outperforms state-of-the-art action recognition methods on three different public datasets KTH, UCF-11 Youtube and HOLLYWOOD2.
Similar content being viewed by others
References
Ahmed I, Ahmad A, Piccialli F, Sangaiah AK, Jeon G (2018) “A robust features-based person tracker for overhead views in industrial environment’. IEEE Internet of Things Journal 5(3):1598–1605
Ali S, Basharat A, Shah M (2007) “Chaotic invariants for human action recognition”. In: 2007 IEEE 11th International Conference on Computer Vision, pp 1–8
Bashir K, Xiang T, Gong S (2009) “Gait representation using flow fields”. In: Proceedings of the British Machine Vision Conference. 1em plus 0.5em minus 0.4em BMVA Press, pp. 113.1–113.11.
BenAbdelkader C, Cutler R, Nanda H, Davis LS (2001) “Eigengait: Motion-based recognition of people using image self-similarity”. In: Proceedings of the Third International Conference on Audio- and Video-Based Biometric Person Authentication, ser. AVBPA ’01. 1em plus 0.5em minus 0.4em London, UK, UK: Springer-Verlag, pp. 284–294. [Online]. Available: http://dl.acm.org/citation.cfm?id=646073.677457
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(3):257–267
Castro FM, Marín-Jimenez MJ, Medina-Carnicer R (2014) “Pyramidal fisher motion for multiview gait recognition”. In: Proceedings of the 2014 22Nd International Conference on Pattern Recognition, ser. ICPR ’14. 1em plus 0.5em minus 0.4em Washington, DC, USA: IEEE Computer Society, pp. 1692–1697. [Online]. Available: https://doi.org/10.1109/ICPR.2014.298
Chattopadhyay P, Roy A, Sural S, Mukhopadhyay J (2014) “Pose depth volume extraction from rgb-d streams for frontal gait recognition”. J. Vis. Comun. Image Represent. 25(1):53–63. https://doi.org/10.1016/j.jvcir.2013.02.010
Cheok MJ, Omar Z, Jaward MH (2019) A review of hand gesture and sign language recognition techniques. International Journal of Machine Learning and Cybernetics 10(1):131–153
Choudhury SD, Tjahjadi T (2013) “Gait recognition based on shape and motion analysis of silhouette contours”. Computer Vision and Image Understanding 117(12):1770 – 1785. http://www.sciencedirect.com/science/article/pii/S1077314213001537
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4):677–691
Feichtenhofer C, Pinz A, Zisserman A (2016) “Convolutional two-stream network fusion for video action recognition”. CoRR, vol. abs/1604.06573, [Online]. Available: 1604.06573
Han J, Bhanu B (2006) Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28(2):316–322
Hayder Ali CAEGM, Dargham J (2011) Gait recognition using gait energy image. International Journal of Signal Processing, Image Processing and Pattern Recognition 4:3.141–3.152
He W, Li P (2010) “Gait recognition using the temporal information of leg angles”. In: Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on vol. 5, pp 78–83
He K, Zhang X, Ren S, Sun J (2016) “Deep residual learning for image recognition”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Herath S, Harandi M, Porikli F (2017) “Going deeper into action recognition”. Image Vision Comput 60(C):4–21. https://doi.org/10.1016/j.imavis.2017.01.010
Ji S, Xu W, Yang M, Yu K (2013) “3d convolutional neural networks for human action recognition”. IEEE Trans. Pattern Anal. Mach. Intell 35 (1):221–231. https://doi.org/10.1109/TPAMI.2012.59
Kovac~ J, Peer P (2014) Human skeleton model based dynamic features for walking speed invariant gait recognition. Mathematical Problems in Engineering, vol (2014)
Lam THW, Cheung KH, Liu JNK (2011) “Gait flow image: A silhouette-based gait representation for human identification”. Pattern Recogn 44(4):973–987. https://doi.org/10.1016/j.patcog.2010.10.011
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Lee H, Hong S, Kim E (2009) “An efficient gait recognition with backpack removal”. EURASIP J. Adv. Signal Process 2009:46.1–46.7. https://doi.org/10.1155/2009/384384
Lee CP, Tan AW, Tan SC (2014) Time-sliced averaged motion history image for gait recognition. J Vis Commun Image Represent 25(5):822–826
Lu J, Hu J, Tan YP (2016) “Nonlinear metric learning for visual tracking”. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6
McLaughlin N, del Rincon JM, Miller PC (2016) “Person reidentification using deep convnets with multitask learning”. IEEE Transactions on Circuits and Systems for Video Technology 27(3):525–539
Misra I, Zitnick CL, Hebert M (2016) “Unsupervised learning using sequential verification for action recognition”. CoRR, vol. abs/1603.08561, [Online]. Available: 1603.08561
Paszke A, Gross S, Chintala S, Chanan G (2017) “Pytorch”
Peng X, Qiao Y, Peng Q (2014) Motion boundary based sampling and 3d co-occurrence descriptors for action recognition. Image Vis Comput 32 (9):616–628
Rahmani H, Mian AS, Shah M (2016) “Learning a deep model for human action recognition from novel viewpoints”. CoRR, vol. abs/1602.00828, [Online]. Available: 1602.00828
Rashwan HA, García MÁ, Chambon S, Puig D (2019) “Gait representation and recognition from temporal co-occurrence of flow fields”. Machine Vision and Applications 30(1):139–152. https://doi.org/10.1007/s00138-018-0982-3
Rashwan HA, García MA, Puig D (2013) “Variational optical flow estimation based on stick tensor voting’. IEEE Trans Image Process 22(7):2589–2599
Rashwan HA, Puig D, García MA (2012) “Improving the robustness of variational optical flow through tensor voting”. Computer Vision and Image Understanding 116(9):953–966. https://doi.org/10.1016/j.cviu.2012.04.006
Simonyan K, Zisserman A (2014) “Two-stream convolutional networks for action recognition in videos”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, ser. NIPS’14. 1em plus 0.5em minus 0.4em Cambridge, MA, USA: MIT Press, pp. 568–576. [Online]. Available: http://dl.acm.org/citation.cfm?id=2968826.2968890
Srivastava N, Mansimov E, Salakhudinov R (2015) “Unsupervised learning of video representations using lstms”. In: International Conference on Machine Learning. pp 843–852
Subetha T, Chitrakala S (2016) “A survey on human activity recognition from videos”. In: 2016 International Conference on Information Communication and Embedded Systems (ICICES), pp 1–7
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) “Learning spatio-temporal features with 3d convolutional networks”. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ser. ICCV ’15. 1em plus 0.5em minus 0.4em Washington, DC, USA: IEEE Computer Society, pp. 4489–4497. [Online]. Available: https://doi.org/10.1109/ICCV.2015.510
Varol G, Laptev I, Schmid C (2016) “Long-term temporal convolutions for action recognition”. CoRR, vol. abs/1604.04494. 1604.04494
Wang H, Kläser A, Schmid C, Liu C-L (2013) “Dense trajectories and motion boundary descriptors for action recognition”. International Journal of Computer Vision 103(1):60–79. https://hal.inria.fr/hal-00803241
Wang J, Liu Z, Wu Y, Yuan J (2012) “Mining actionlet ensemble for action recognition with depth cameras”. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 1290–1297
Wang C, Zhang J, Wang L, Pu J, Yuan X (2012) “Human identification using temporal information preserving gait template’. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11):2164–2176
Willems G, Tuytelaars T, Van Gool L (2008) An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. 1em plus 0.5em minus 0.4em Berlin Heidelberg: Springer Berlin Heidelberg, pp 650–663
Wu Z, Jiang Y-G, Wang X, Ye H, Xue X (2016) “Multi-stream multi-class fusion of deep networks for video classification”. In: Proceedings of the 2016 ACM on Multimedia Conference, ser. MM ’16. 1em plus 0.5em minus 0.4em New York, NY, USA: ACM, pp. 791–800. [Online]. Available: http://doi.acm.org/10.1145/2964284.2964328
Yilmaz A, Shah M (2005) “Recognizing human actions in videos acquired by uncalibrated moving cameras”. In: Tenth IEEE International Conference on Computer Vision (ICCV?05) Volume 1, vol. 1, pp. 150?157 Vol. 1
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rashwan, H.A., Garcia, M.A., Abdulwahab, S. et al. Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks. Multimed Tools Appl 79, 34141–34158 (2020). https://doi.org/10.1007/s11042-020-09194-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09194-w