Abstract
We analyze the relation of video complexity with the performance of Human Action Recognition (HAR) algorithms. The rationale behind this is that variations in image conditions (e.g. occlusion, camera movement, resolution, and illumination), and image content (e.g. edge density, and number of objects), both depicting scene complexity increase the difficulty to recognize activities for a computing model. The HAR algorithms used in this work are improved Dense Trajectories (iDT) [25], Motion-Augmented RGB Stream for Action Recognition (MARS) [5], and SlowFast [7] compared with the number of people and objects in the scene and to three statistical measures: entropy, number of regions and edge density. The results so far show a correlation between complexity and the classification performance. Mask-RCNN simulation for counting elements was carried in the supercomputer cluster of LSC-INAOE.
Supported by CONACyT.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adami, C.: What is complexity? BioEssays 24(12), 1085–1094 (2002)
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3), 1–43 (2011)
Akpulat, M., Ekinci, M.: Detecting interaction/complexity within crowd movements using braid entropy. Front. Inf. Technol. Electron. Eng. 20(6), 849–861 (2019)
Ali, S.: Measuring flow complexity in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1097–1104 (2013)
Crasto, N., Weinzaepfel, P., Alahari, K., Schmid, C.: Mars: motion-augmented RGB stream for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7882–7891 (2019)
Ding, L., Goshtasby, A.: On the canny edge detector. Pattern Recogn. 34(3), 721–725 (2001)
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6202–6211 (2019)
Grünwald, P.D., Vitányi, P.M.: Kolmogorov complexity and information theory. With an interpretation in terms of questions and answers. J. Log. Lang. Inf. 12(4), 497–529 (2003)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Hiremath, S.K., Plötz, T.: Deriving effective human activity recognition systems through objective task complexity assessment. Proc. ACM Interact. Mob. Wearable Ubiquit. Technol. 4(4), 1–24 (2020)
Lee, J.S., Ebrahimi, T.: Perceptual video compression: a survey. IEEE J. Sel. Top. Signal Process. 6(6), 684–697 (2012)
Lin, Z.Y., Chen, J.L., Chen, L.G.: A 203 FPS VLSI architecture of improved dense trajectories for real-time human action recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1115–1119. IEEE (2018)
Luo, B., Li, H., Meng, F., Wu, Q., Ngan, K.N.: An unsupervised method to extract video object via complexity awareness and object local parts. IEEE Trans. Circ. Syst. Video Technol. 28(7), 1580–1594 (2017)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Mishra, A., Pandey, A., Murthy, H.A.: Zero-shot learning for action recognition using synthesized features. Neurocomputing 390, 117–130 (2020)
Nagle, F., Lavie, N.: Predicting human complexity perception of real-world scenes. R. Soc. Open Sci. 7(5), 191487 (2020)
Olivia, A., Mack, M.L., Shrestha, M., Peeper, A.: Identifying the perceptual dimensions of visual complexity of scenes. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 26 (2004)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)
Rosenholtz, R., Li, Y., Mansfield, J., Jin, Z.: Feature congestion: a measure of display clutter. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 761–770 (2005)
Sahaf, Y., Krishnan, N.C., Cook, D.J.: Defining the complexity of an activity. In: Activity Context Representation (2011)
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Review 5(1), 3–55 (2001)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Standish, R.K.: Concept and definition of complexity. In: Intelligent Complex Adaptive Systems, pp. 105–124. IGI Global (2008)
Tokmakov, P., Hebert, M., Schmid, C.: Unsupervised learning of video representations via dense trajectory clustering. arXiv preprint arXiv:2006.15731 (2020)
Wang, H., Oneata, D., Verbeek, J., Schmid, C.: A robust and efficient video representation for action recognition. Int. J. Comput. Vis. 119(3), 219–238 (2016)
Wang, L., Koniusz, P., Huynh, D.Q.: Hallucinating IDT descriptors and I3D optical flow features for action recognition with CNNs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8698–8708 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Burgos-Madrigal, A., Altamirano-Robles, L. (2021). Video and Image Complexity in Human Action Recognition. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2021. Lecture Notes in Computer Science(), vol 13055. Springer, Cham. https://doi.org/10.1007/978-3-030-89691-1_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-89691-1_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89690-4
Online ISBN: 978-3-030-89691-1
eBook Packages: Computer ScienceComputer Science (R0)