Skip to main content

Video and Image Complexity in Human Action Recognition

  • Conference paper
  • First Online:
  • 687 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13055))

Abstract

We analyze the relation of video complexity with the performance of Human Action Recognition (HAR) algorithms. The rationale behind this is that variations in image conditions (e.g. occlusion, camera movement, resolution, and illumination), and image content (e.g. edge density, and number of objects), both depicting scene complexity increase the difficulty to recognize activities for a computing model. The HAR algorithms used in this work are improved Dense Trajectories (iDT) [25], Motion-Augmented RGB Stream for Action Recognition (MARS) [5], and SlowFast [7] compared with the number of people and objects in the scene and to three statistical measures: entropy, number of regions and edge density. The results so far show a correlation between complexity and the classification performance. Mask-RCNN simulation for counting elements was carried in the supercomputer cluster of LSC-INAOE.

Supported by CONACyT.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Adami, C.: What is complexity? BioEssays 24(12), 1085–1094 (2002)

    Google Scholar 

  2. Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3), 1–43 (2011)

    Google Scholar 

  3. Akpulat, M., Ekinci, M.: Detecting interaction/complexity within crowd movements using braid entropy. Front. Inf. Technol. Electron. Eng. 20(6), 849–861 (2019)

    Google Scholar 

  4. Ali, S.: Measuring flow complexity in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1097–1104 (2013)

    Google Scholar 

  5. Crasto, N., Weinzaepfel, P., Alahari, K., Schmid, C.: Mars: motion-augmented RGB stream for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7882–7891 (2019)

    Google Scholar 

  6. Ding, L., Goshtasby, A.: On the canny edge detector. Pattern Recogn. 34(3), 721–725 (2001)

    Google Scholar 

  7. Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6202–6211 (2019)

    Google Scholar 

  8. Grünwald, P.D., Vitányi, P.M.: Kolmogorov complexity and information theory. With an interpretation in terms of questions and answers. J. Log. Lang. Inf. 12(4), 497–529 (2003)

    Google Scholar 

  9. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  10. Hiremath, S.K., Plötz, T.: Deriving effective human activity recognition systems through objective task complexity assessment. Proc. ACM Interact. Mob. Wearable Ubiquit. Technol. 4(4), 1–24 (2020)

    Google Scholar 

  11. Lee, J.S., Ebrahimi, T.: Perceptual video compression: a survey. IEEE J. Sel. Top. Signal Process. 6(6), 684–697 (2012)

    Google Scholar 

  12. Lin, Z.Y., Chen, J.L., Chen, L.G.: A 203 FPS VLSI architecture of improved dense trajectories for real-time human action recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1115–1119. IEEE (2018)

    Google Scholar 

  13. Luo, B., Li, H., Meng, F., Wu, Q., Ngan, K.N.: An unsupervised method to extract video object via complexity awareness and object local parts. IEEE Trans. Circ. Syst. Video Technol. 28(7), 1580–1594 (2017)

    Google Scholar 

  14. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)

    Google Scholar 

  15. Mishra, A., Pandey, A., Murthy, H.A.: Zero-shot learning for action recognition using synthesized features. Neurocomputing 390, 117–130 (2020)

    Google Scholar 

  16. Nagle, F., Lavie, N.: Predicting human complexity perception of real-world scenes. R. Soc. Open Sci. 7(5), 191487 (2020)

    Google Scholar 

  17. Olivia, A., Mack, M.L., Shrestha, M., Peeper, A.: Identifying the perceptual dimensions of visual complexity of scenes. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 26 (2004)

    Google Scholar 

  18. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)

  19. Rosenholtz, R., Li, Y., Mansfield, J., Jin, Z.: Feature congestion: a measure of display clutter. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 761–770 (2005)

    Google Scholar 

  20. Sahaf, Y., Krishnan, N.C., Cook, D.J.: Defining the complexity of an activity. In: Activity Context Representation (2011)

    Google Scholar 

  21. Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Review 5(1), 3–55 (2001)

    Google Scholar 

  22. Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  23. Standish, R.K.: Concept and definition of complexity. In: Intelligent Complex Adaptive Systems, pp. 105–124. IGI Global (2008)

    Google Scholar 

  24. Tokmakov, P., Hebert, M., Schmid, C.: Unsupervised learning of video representations via dense trajectory clustering. arXiv preprint arXiv:2006.15731 (2020)

  25. Wang, H., Oneata, D., Verbeek, J., Schmid, C.: A robust and efficient video representation for action recognition. Int. J. Comput. Vis. 119(3), 219–238 (2016)

    Google Scholar 

  26. Wang, L., Koniusz, P., Huynh, D.Q.: Hallucinating IDT descriptors and I3D optical flow features for action recognition with CNNs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8698–8708 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Burgos-Madrigal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Burgos-Madrigal, A., Altamirano-Robles, L. (2021). Video and Image Complexity in Human Action Recognition. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2021. Lecture Notes in Computer Science(), vol 13055. Springer, Cham. https://doi.org/10.1007/978-3-030-89691-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89691-1_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89690-4

  • Online ISBN: 978-3-030-89691-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics