Video and Image Complexity in Human Action Recognition

Burgos-Madrigal, Andrea; Altamirano-Robles, Leopoldo

doi:10.1007/978-3-030-89691-1_34

Video and Image Complexity in Human Action Recognition

Conference paper
First Online: 04 November 2021

687 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13055))

Abstract

We analyze the relation of video complexity with the performance of Human Action Recognition (HAR) algorithms. The rationale behind this is that variations in image conditions (e.g. occlusion, camera movement, resolution, and illumination), and image content (e.g. edge density, and number of objects), both depicting scene complexity increase the difficulty to recognize activities for a computing model. The HAR algorithms used in this work are improved Dense Trajectories (iDT) [25], Motion-Augmented RGB Stream for Action Recognition (MARS) [5], and SlowFast [7] compared with the number of people and objects in the scene and to three statistical measures: entropy, number of regions and edge density. The results so far show a correlation between complexity and the classification performance. Mask-RCNN simulation for counting elements was carried in the supercomputer cluster of LSC-INAOE.

Supported by CONACyT.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Adami, C.: What is complexity? BioEssays 24(12), 1085–1094 (2002)
Google Scholar
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3), 1–43 (2011)
Google Scholar
Akpulat, M., Ekinci, M.: Detecting interaction/complexity within crowd movements using braid entropy. Front. Inf. Technol. Electron. Eng. 20(6), 849–861 (2019)
Google Scholar
Ali, S.: Measuring flow complexity in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1097–1104 (2013)
Google Scholar
Crasto, N., Weinzaepfel, P., Alahari, K., Schmid, C.: Mars: motion-augmented RGB stream for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7882–7891 (2019)
Google Scholar
Ding, L., Goshtasby, A.: On the canny edge detector. Pattern Recogn. 34(3), 721–725 (2001)
Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6202–6211 (2019)
Google Scholar
Grünwald, P.D., Vitányi, P.M.: Kolmogorov complexity and information theory. With an interpretation in terms of questions and answers. J. Log. Lang. Inf. 12(4), 497–529 (2003)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Hiremath, S.K., Plötz, T.: Deriving effective human activity recognition systems through objective task complexity assessment. Proc. ACM Interact. Mob. Wearable Ubiquit. Technol. 4(4), 1–24 (2020)
Google Scholar
Lee, J.S., Ebrahimi, T.: Perceptual video compression: a survey. IEEE J. Sel. Top. Signal Process. 6(6), 684–697 (2012)
Google Scholar
Lin, Z.Y., Chen, J.L., Chen, L.G.: A 203 FPS VLSI architecture of improved dense trajectories for real-time human action recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1115–1119. IEEE (2018)
Google Scholar
Luo, B., Li, H., Meng, F., Wu, Q., Ngan, K.N.: An unsupervised method to extract video object via complexity awareness and object local parts. IEEE Trans. Circ. Syst. Video Technol. 28(7), 1580–1594 (2017)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Google Scholar
Mishra, A., Pandey, A., Murthy, H.A.: Zero-shot learning for action recognition using synthesized features. Neurocomputing 390, 117–130 (2020)
Google Scholar
Nagle, F., Lavie, N.: Predicting human complexity perception of real-world scenes. R. Soc. Open Sci. 7(5), 191487 (2020)
Google Scholar
Olivia, A., Mack, M.L., Shrestha, M., Peeper, A.: Identifying the perceptual dimensions of visual complexity of scenes. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 26 (2004)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)
Rosenholtz, R., Li, Y., Mansfield, J., Jin, Z.: Feature congestion: a measure of display clutter. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 761–770 (2005)
Google Scholar
Sahaf, Y., Krishnan, N.C., Cook, D.J.: Defining the complexity of an activity. In: Activity Context Representation (2011)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Review 5(1), 3–55 (2001)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Standish, R.K.: Concept and definition of complexity. In: Intelligent Complex Adaptive Systems, pp. 105–124. IGI Global (2008)
Google Scholar
Tokmakov, P., Hebert, M., Schmid, C.: Unsupervised learning of video representations via dense trajectory clustering. arXiv preprint arXiv:2006.15731 (2020)
Wang, H., Oneata, D., Verbeek, J., Schmid, C.: A robust and efficient video representation for action recognition. Int. J. Comput. Vis. 119(3), 219–238 (2016)
Google Scholar
Wang, L., Koniusz, P., Huynh, D.Q.: Hallucinating IDT descriptors and I3D optical flow features for action recognition with CNNs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8698–8708 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science, National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
Andrea Burgos-Madrigal & Leopoldo Altamirano-Robles

Authors

Andrea Burgos-Madrigal
View author publications
You can also search for this author in PubMed Google Scholar
Leopoldo Altamirano-Robles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Burgos-Madrigal .

Editor information

Editors and Affiliations

Universidad de las Ciencias Informáticas, La Habana, Cuba
Yanio Hernández Heredia
Universidad de las Ciencias Informáticas, La Habana, Cuba
Vladimir Milián Núñez
Universidad de las Ciencias Informáticas, La Habana, Cuba
José Ruiz Shulcloper

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burgos-Madrigal, A., Altamirano-Robles, L. (2021). Video and Image Complexity in Human Action Recognition. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2021. Lecture Notes in Computer Science(), vol 13055. Springer, Cham. https://doi.org/10.1007/978-3-030-89691-1_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-89691-1_34
Published: 04 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89690-4
Online ISBN: 978-3-030-89691-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics