Abstract
Human action recognition from visual data is a popular topic in Computer Vision, applied in a wide range of domains. State-of-the-art solutions often include deep-learning approaches based on RGB videos and pre-computed optical flow maps. Recently, 3D Gray-Code Kernels projections have been assessed as an alternative way of representing motion, being able to efficiently capture space-time structures. In this work, we investigate the use of GCK pooling maps, which we called GCK-Maps, as input for addressing Human Action Recognition with CNNs. We provide an experimental comparison with RGB and optical flow in terms of accuracy, efficiency, and scene-bias dependency. Our results show that GCK-Maps generally represent a valuable alternative to optical flow and RGB frames, with a significant reduction of the computational burden.
E. Nicora and V. P. Pastore—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The dataset will be soon released.
- 2.
Implementation available at http://people.csail.mit.edu/celiu/OpticalFlow/.
References
Adadi, A.: A survey on data-efficient algorithms in big data era. Jour. Big Data 8(1) (2021)
Ahad, M.A.R., Tan, J.K., Kim, H., Ishikawa, S.: Motion history image: its variants and applications. Mach. Vis. Appl. 23(2), 255–281 (2012)
Asghari-Esfeden, S., Sznaier, M., Camps, O.: Dynamic motion representation for human action recognition. In: IEEE WACV, pp. 557–566 (2020)
Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. IJCV 92, 1–31 (2011)
Ben-Artzi, G., Hel-Or, H., Hel-Or, Y.: IEEE PAMI 29(3), 382–393 (2007)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: IEEE CVPR, pp. 6299–6308 (2017)
Choi, J., Gao, C., Messou, J.C., Huang, J.B.: Why can’t i dance in the mall? learning to mitigate scene bias in action recognition. In: Advances in Neural Information Processing Systems 32 (2019)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE CVPR, vol. 1, pp. 886–893 (2005)
Gehrig, D., Kuehne, H., Woerner, A., Schultz, T.: Hmm-based human motion recognition with optical flow data. In: IEEE-RAS Humanoids, pp. 425–430 (2009)
Girdhar, R., Ramanan, D.: Attentional pooling for action recognition. In: Advances in Neural Information Processing Systems 30 (2017)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE PAMI 29(12), 2247–2253 (2007)
Grossi, G., Lanzarotti, R., Napoletano, P., Noceti, N., Odone, F.: Positive technology for elderly well-being: a review. PR Lett. 137, 61–70 (2020)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE CVPR, pp. 4700–4708 (2017)
Kong, Y., Fu, Y.: Human action recognition and prediction: a survey. IJCV 130(5), 1366–1401 (2022)
Li, D., Yao, T., Duan, L., Mei, T., Rui, Y.: Unified spatio-temporal attention networks for action recognition in videos. IEEE Trans. Multimedia 21(2), 416–428 (2018)
Li, Y., Li, Y., Vasconcelos, N.: RESOUND: towards action recognition without representation Bias. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 520–535. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_32
Li, Z., Gavrilyuk, K., Gavves, E., Jain, M., Snoek, C.G.: Videolstm convolves, attends and flows for action recognition. In: CVIU, vol. 166, pp. 41-50 (2018)
Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE ICCV, vol. 2, pp. 1150–1157 (1999)
Moro, M., et al.: A markerless pipeline to analyze spontaneous movements of preterm infants. Comput. Methods Programs Biomed. 226, 107119 (2022)
Moshe, Y., Hel-Or, H.: Video block motion estimation based on gray-code kernels. IEEE TIP 18(10), 2243–2254 (2009)
Moshe, Y., Hel-Or, H., Hel-Or, Y.: Foreground detection using spatiotemporal projection kernels. In: IEEE CVPR, pp. 3210–3217 (2012)
Nicora, E., Goyal, G., Noceti, N., Vignolo, A., Sciutti, A., Odone, F.: The moca dataset, kinematic and multi-view visual streams of fine-grained cooking actions. Scientific Data 7(1), 1–15 (2020)
Nicora, E., Noceti, N.: Exploring the use of efficient projection kernels for motion saliency estimation. In: ICIAP, pp. 158–169 (2022)
Nicora, E., Noceti, N.: On the use of efficient projection kernels for motion-based visual saliency estimation. Front. Comput. Sci. 4 (2022)
Noceti, N., Odone, F.: Learning common behaviors from large sets of unlabeled temporal series. ImaVis 30(11), 875–895 (2012)
Poppe, R.: A survey on vision-based human action recognition. ImaVis 28(6), 976–990 (2010)
Sevilla-Lara, L., Liao, Y., Güney, F., Jampani, V., Geiger, A., Black, M.J.: On the integration of optical flow and action recognition. In: GCPR, pp. 281–297 (2019)
Shekokar, R.U., Kale, S.N.: Deep learning for human action recognition. In: 2021 6th International Conference for Convergence in Technology (I2CT), pp. 1–5 (2021)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems 27 (2014)
Sun, L., Jia, K., Chen, K., Yeung, D.Y., Shi, B.E., Savarese, S.: Lattice long short-term memory for human action recognition. In: IEEE ICCV, pp. 2147–2156 (2017)
Sun, Z., Ke, Q., Rahmani, H., Bennamoun, M., Wang, G., Liu, J.: Human action recognition from various data modalities: a review. In: IEEE PAMI, pp. 1–20 (2022)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: IEEE ICCV, pp. 4489–4497 (2015)
Tu, Z., et al.: Multi-stream cnn: Learning representations based on human-related regions for action recognition. PR 79, 32–43 (2018)
Vignolo, A., Noceti, N., Rea, F., Sciutti, A., Odone, F., Sandini, G.: Detecting biological motion for human-robot interaction: a link between perception and action. Front. Robotics AI, 14 (2017)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE ICCV, pp. 3551–3558 (2013)
Yao, R., Lin, G., Xia, S., Zhao, J., Zhou, Y.: Video object segmentation and tracking: A survey. ACM TIST 11(4), 1–47 (2020)
Zhang, Z., Hu, Y., Chan, S., Chia, L.-T.: Motion context: a new representation for human action recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 817–829. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_60
Acknowledgments
This work has been supported by AFOSR with the grant n. FA8655-20-1-7035. VPP was supported by FSE REACT-EU-PON 2014-2020, DM 1062/2021.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nicora, E., Pastore, V.P., Noceti, N. (2023). GCK-Maps: A Scene Unbiased Representation for Efficient Human Action Recognition. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14233. Springer, Cham. https://doi.org/10.1007/978-3-031-43148-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-43148-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43147-0
Online ISBN: 978-3-031-43148-7
eBook Packages: Computer ScienceComputer Science (R0)