Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks

Rashwan, Hatem A.; Garcia, Miguel Angel; Abdulwahab, Saddam; Puig, Domenec

doi:10.1007/s11042-020-09194-w

Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks

Published: 01 July 2020

Volume 79, pages 34141–34158, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hatem A. Rashwan ORCID: orcid.org/0000-0001-5421-1637¹,
Miguel Angel Garcia²,
Saddam Abdulwahab¹ &
…
Domenec Puig¹

334 Accesses
9 Citations
Explore all metrics

Abstract

Many applications require action recognition skills, from human-machine interaction to intelligent video surveillance. Action recognition in video sequences cannot be based on simply processing raw color images or optical flow fields. Color images provide appearance information of moving objects, but lack motion features. They are also very sensitive to variations due to clothing and camera pose that badly affect the action recognition accuracy. In turn, raw optical flow measures instantaneous motion, not the overall dynamics of actions, and is sensitive to noise. More robust and meaningful motion features and classifiers are thus required for action recognition to be reliable. This paper proposes a new action recognition technique based on a deep convolutional neural network (CNN) fed with Histograms of Optical Flow Co-Occurrence (HOF-CO) motion features. HOF-CO is a robust motion representation previously proposed by the authors to encode the relative frequency of pairs of optical flow directions computed at each image pixel. Experimental results show that this approach outperforms state-of-the-art action recognition methods on three different public datasets KTH, UCF-11 Youtube and HOLLYWOOD2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Action Recognition from Optical Flow Visualizations

Second-order motion descriptors for efficient action recognition

Article 28 October 2020

On the Integration of Optical Flow and Action Recognition

References

Ahmed I, Ahmad A, Piccialli F, Sangaiah AK, Jeon G (2018) “A robust features-based person tracker for overhead views in industrial environment’. IEEE Internet of Things Journal 5(3):1598–1605
Article Google Scholar
Ali S, Basharat A, Shah M (2007) “Chaotic invariants for human action recognition”. In: 2007 IEEE 11th International Conference on Computer Vision, pp 1–8
Bashir K, Xiang T, Gong S (2009) “Gait representation using flow fields”. In: Proceedings of the British Machine Vision Conference. 1em plus 0.5em minus 0.4em BMVA Press, pp. 113.1–113.11.
BenAbdelkader C, Cutler R, Nanda H, Davis LS (2001) “Eigengait: Motion-based recognition of people using image self-similarity”. In: Proceedings of the Third International Conference on Audio- and Video-Based Biometric Person Authentication, ser. AVBPA ’01. 1em plus 0.5em minus 0.4em London, UK, UK: Springer-Verlag, pp. 284–294. [Online]. Available: http://dl.acm.org/citation.cfm?id=646073.677457
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(3):257–267
Article Google Scholar
Castro FM, Marín-Jimenez MJ, Medina-Carnicer R (2014) “Pyramidal fisher motion for multiview gait recognition”. In: Proceedings of the 2014 22Nd International Conference on Pattern Recognition, ser. ICPR ’14. 1em plus 0.5em minus 0.4em Washington, DC, USA: IEEE Computer Society, pp. 1692–1697. [Online]. Available: https://doi.org/10.1109/ICPR.2014.298
Chattopadhyay P, Roy A, Sural S, Mukhopadhyay J (2014) “Pose depth volume extraction from rgb-d streams for frontal gait recognition”. J. Vis. Comun. Image Represent. 25(1):53–63. https://doi.org/10.1016/j.jvcir.2013.02.010
Article Google Scholar
Cheok MJ, Omar Z, Jaward MH (2019) A review of hand gesture and sign language recognition techniques. International Journal of Machine Learning and Cybernetics 10(1):131–153
Article Google Scholar
Choudhury SD, Tjahjadi T (2013) “Gait recognition based on shape and motion analysis of silhouette contours”. Computer Vision and Image Understanding 117(12):1770 – 1785. http://www.sciencedirect.com/science/article/pii/S1077314213001537
Article Google Scholar
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4):677–691
Article Google Scholar
Feichtenhofer C, Pinz A, Zisserman A (2016) “Convolutional two-stream network fusion for video action recognition”. CoRR, vol. abs/1604.06573, [Online]. Available: 1604.06573
Han J, Bhanu B (2006) Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28(2):316–322
Article Google Scholar
Hayder Ali CAEGM, Dargham J (2011) Gait recognition using gait energy image. International Journal of Signal Processing, Image Processing and Pattern Recognition 4:3.141–3.152
Google Scholar
He W, Li P (2010) “Gait recognition using the temporal information of leg angles”. In: Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on vol. 5, pp 78–83
He K, Zhang X, Ren S, Sun J (2016) “Deep residual learning for image recognition”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Herath S, Harandi M, Porikli F (2017) “Going deeper into action recognition”. Image Vision Comput 60(C):4–21. https://doi.org/10.1016/j.imavis.2017.01.010
Article Google Scholar
Ji S, Xu W, Yang M, Yu K (2013) “3d convolutional neural networks for human action recognition”. IEEE Trans. Pattern Anal. Mach. Intell 35 (1):221–231. https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Kovac~ J, Peer P (2014) Human skeleton model based dynamic features for walking speed invariant gait recognition. Mathematical Problems in Engineering, vol (2014)
Lam THW, Cheung KH, Liu JNK (2011) “Gait flow image: A silhouette-based gait representation for human identification”. Pattern Recogn 44(4):973–987. https://doi.org/10.1016/j.patcog.2010.10.011
Article Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Lee H, Hong S, Kim E (2009) “An efficient gait recognition with backpack removal”. EURASIP J. Adv. Signal Process 2009:46.1–46.7. https://doi.org/10.1155/2009/384384
Article Google Scholar
Lee CP, Tan AW, Tan SC (2014) Time-sliced averaged motion history image for gait recognition. J Vis Commun Image Represent 25(5):822–826
Article Google Scholar
Lu J, Hu J, Tan YP (2016) “Nonlinear metric learning for visual tracking”. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6
McLaughlin N, del Rincon JM, Miller PC (2016) “Person reidentification using deep convnets with multitask learning”. IEEE Transactions on Circuits and Systems for Video Technology 27(3):525–539
Article Google Scholar
Misra I, Zitnick CL, Hebert M (2016) “Unsupervised learning using sequential verification for action recognition”. CoRR, vol. abs/1603.08561, [Online]. Available: 1603.08561
Paszke A, Gross S, Chintala S, Chanan G (2017) “Pytorch”
Peng X, Qiao Y, Peng Q (2014) Motion boundary based sampling and 3d co-occurrence descriptors for action recognition. Image Vis Comput 32 (9):616–628
Article Google Scholar
Rahmani H, Mian AS, Shah M (2016) “Learning a deep model for human action recognition from novel viewpoints”. CoRR, vol. abs/1602.00828, [Online]. Available: 1602.00828
Rashwan HA, García MÁ, Chambon S, Puig D (2019) “Gait representation and recognition from temporal co-occurrence of flow fields”. Machine Vision and Applications 30(1):139–152. https://doi.org/10.1007/s00138-018-0982-3
Article Google Scholar
Rashwan HA, García MA, Puig D (2013) “Variational optical flow estimation based on stick tensor voting’. IEEE Trans Image Process 22(7):2589–2599
Article Google Scholar
Rashwan HA, Puig D, García MA (2012) “Improving the robustness of variational optical flow through tensor voting”. Computer Vision and Image Understanding 116(9):953–966. https://doi.org/10.1016/j.cviu.2012.04.006
Article Google Scholar
Simonyan K, Zisserman A (2014) “Two-stream convolutional networks for action recognition in videos”. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, ser. NIPS’14. 1em plus 0.5em minus 0.4em Cambridge, MA, USA: MIT Press, pp. 568–576. [Online]. Available: http://dl.acm.org/citation.cfm?id=2968826.2968890
Srivastava N, Mansimov E, Salakhudinov R (2015) “Unsupervised learning of video representations using lstms”. In: International Conference on Machine Learning. pp 843–852
Subetha T, Chitrakala S (2016) “A survey on human activity recognition from videos”. In: 2016 International Conference on Information Communication and Embedded Systems (ICICES), pp 1–7
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) “Learning spatio-temporal features with 3d convolutional networks”. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ser. ICCV ’15. 1em plus 0.5em minus 0.4em Washington, DC, USA: IEEE Computer Society, pp. 4489–4497. [Online]. Available: https://doi.org/10.1109/ICCV.2015.510
Varol G, Laptev I, Schmid C (2016) “Long-term temporal convolutions for action recognition”. CoRR, vol. abs/1604.04494. 1604.04494
Wang H, Kläser A, Schmid C, Liu C-L (2013) “Dense trajectories and motion boundary descriptors for action recognition”. International Journal of Computer Vision 103(1):60–79. https://hal.inria.fr/hal-00803241
Article MathSciNet Google Scholar
Wang J, Liu Z, Wu Y, Yuan J (2012) “Mining actionlet ensemble for action recognition with depth cameras”. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 1290–1297
Wang C, Zhang J, Wang L, Pu J, Yuan X (2012) “Human identification using temporal information preserving gait template’. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11):2164–2176
Article Google Scholar
Willems G, Tuytelaars T, Van Gool L (2008) An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. 1em plus 0.5em minus 0.4em Berlin Heidelberg: Springer Berlin Heidelberg, pp 650–663
Wu Z, Jiang Y-G, Wang X, Ye H, Xue X (2016) “Multi-stream multi-class fusion of deep networks for video classification”. In: Proceedings of the 2016 ACM on Multimedia Conference, ser. MM ’16. 1em plus 0.5em minus 0.4em New York, NY, USA: ACM, pp. 791–800. [Online]. Available: http://doi.acm.org/10.1145/2964284.2964328
Yilmaz A, Shah M (2005) “Recognizing human actions in videos acquired by uncalibrated moving cameras”. In: Tenth IEEE International Conference on Computer Vision (ICCV?05) Volume 1, vol. 1, pp. 150?157 Vol. 1

Download references

Author information

Authors and Affiliations

Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, Tarragona, Spain
Hatem A. Rashwan, Saddam Abdulwahab & Domenec Puig
Department of Electronic and Communications Technology, Universidad Autnoma de Madrid, Madrid, Spain
Miguel Angel Garcia

Authors

Hatem A. Rashwan
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Angel Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Saddam Abdulwahab
View author publications
You can also search for this author in PubMed Google Scholar
Domenec Puig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hatem A. Rashwan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rashwan, H.A., Garcia, M.A., Abdulwahab, S. et al. Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks. Multimed Tools Appl 79, 34141–34158 (2020). https://doi.org/10.1007/s11042-020-09194-w

Download citation

Received: 22 July 2019
Revised: 04 May 2020
Accepted: 08 June 2020
Published: 01 July 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11042-020-09194-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Action Recognition from Optical Flow Visualizations

Second-order motion descriptors for efficient action recognition

On the Integration of Optical Flow and Action Recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Action Recognition from Optical Flow Visualizations

Second-order motion descriptors for efficient action recognition

On the Integration of Optical Flow and Action Recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation