GCK-Maps: A Scene Unbiased Representation for Efficient Human Action Recognition

Nicora, Elena; Pastore, Vito Paolo; Noceti, Nicoletta

doi:10.1007/978-3-031-43148-7_6

Elena Nicora¹⁰,
Vito Paolo Pastore¹⁰ &
Nicoletta Noceti¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14233))

Included in the following conference series:

International Conference on Image Analysis and Processing

486 Accesses
1 Citations

Abstract

Human action recognition from visual data is a popular topic in Computer Vision, applied in a wide range of domains. State-of-the-art solutions often include deep-learning approaches based on RGB videos and pre-computed optical flow maps. Recently, 3D Gray-Code Kernels projections have been assessed as an alternative way of representing motion, being able to efficiently capture space-time structures. In this work, we investigate the use of GCK pooling maps, which we called GCK-Maps, as input for addressing Human Action Recognition with CNNs. We provide an experimental comparison with RGB and optical flow in terms of accuracy, efficiency, and scene-bias dependency. Our results show that GCK-Maps generally represent a valuable alternative to optical flow and RGB frames, with a significant reduction of the computational burden.

E. Nicora and V. P. Pastore—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The dataset will be soon released.
2.
Implementation available at http://people.csail.mit.edu/celiu/OpticalFlow/.

References

Adadi, A.: A survey on data-efficient algorithms in big data era. Jour. Big Data 8(1) (2021)
Google Scholar
Ahad, M.A.R., Tan, J.K., Kim, H., Ishikawa, S.: Motion history image: its variants and applications. Mach. Vis. Appl. 23(2), 255–281 (2012)
Article Google Scholar
Asghari-Esfeden, S., Sznaier, M., Camps, O.: Dynamic motion representation for human action recognition. In: IEEE WACV, pp. 557–566 (2020)
Google Scholar
Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. IJCV 92, 1–31 (2011)
Article Google Scholar
Ben-Artzi, G., Hel-Or, H., Hel-Or, Y.: IEEE PAMI 29(3), 382–393 (2007)
Article Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: IEEE CVPR, pp. 6299–6308 (2017)
Google Scholar
Choi, J., Gao, C., Messou, J.C., Huang, J.B.: Why can’t i dance in the mall? learning to mitigate scene bias in action recognition. In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE CVPR, vol. 1, pp. 886–893 (2005)
Google Scholar
Gehrig, D., Kuehne, H., Woerner, A., Schultz, T.: Hmm-based human motion recognition with optical flow data. In: IEEE-RAS Humanoids, pp. 425–430 (2009)
Google Scholar
Girdhar, R., Ramanan, D.: Attentional pooling for action recognition. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE PAMI 29(12), 2247–2253 (2007)
Article Google Scholar
Grossi, G., Lanzarotti, R., Napoletano, P., Noceti, N., Odone, F.: Positive technology for elderly well-being: a review. PR Lett. 137, 61–70 (2020)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE CVPR, pp. 4700–4708 (2017)
Google Scholar
Kong, Y., Fu, Y.: Human action recognition and prediction: a survey. IJCV 130(5), 1366–1401 (2022)
Article Google Scholar
Li, D., Yao, T., Duan, L., Mei, T., Rui, Y.: Unified spatio-temporal attention networks for action recognition in videos. IEEE Trans. Multimedia 21(2), 416–428 (2018)
Article Google Scholar
Li, Y., Li, Y., Vasconcelos, N.: RESOUND: towards action recognition without representation Bias. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 520–535. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_32
Chapter Google Scholar
Li, Z., Gavrilyuk, K., Gavves, E., Jain, M., Snoek, C.G.: Videolstm convolves, attends and flows for action recognition. In: CVIU, vol. 166, pp. 41-50 (2018)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE ICCV, vol. 2, pp. 1150–1157 (1999)
Google Scholar
Moro, M., et al.: A markerless pipeline to analyze spontaneous movements of preterm infants. Comput. Methods Programs Biomed. 226, 107119 (2022)
Article Google Scholar
Moshe, Y., Hel-Or, H.: Video block motion estimation based on gray-code kernels. IEEE TIP 18(10), 2243–2254 (2009)
MathSciNet MATH Google Scholar
Moshe, Y., Hel-Or, H., Hel-Or, Y.: Foreground detection using spatiotemporal projection kernels. In: IEEE CVPR, pp. 3210–3217 (2012)
Google Scholar
Nicora, E., Goyal, G., Noceti, N., Vignolo, A., Sciutti, A., Odone, F.: The moca dataset, kinematic and multi-view visual streams of fine-grained cooking actions. Scientific Data 7(1), 1–15 (2020)
Article Google Scholar
Nicora, E., Noceti, N.: Exploring the use of efficient projection kernels for motion saliency estimation. In: ICIAP, pp. 158–169 (2022)
Google Scholar
Nicora, E., Noceti, N.: On the use of efficient projection kernels for motion-based visual saliency estimation. Front. Comput. Sci. 4 (2022)
Google Scholar
Noceti, N., Odone, F.: Learning common behaviors from large sets of unlabeled temporal series. ImaVis 30(11), 875–895 (2012)
Google Scholar
Poppe, R.: A survey on vision-based human action recognition. ImaVis 28(6), 976–990 (2010)
Google Scholar
Sevilla-Lara, L., Liao, Y., Güney, F., Jampani, V., Geiger, A., Black, M.J.: On the integration of optical flow and action recognition. In: GCPR, pp. 281–297 (2019)
Google Scholar
Shekokar, R.U., Kale, S.N.: Deep learning for human action recognition. In: 2021 6th International Conference for Convergence in Technology (I2CT), pp. 1–5 (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems 27 (2014)
Google Scholar
Sun, L., Jia, K., Chen, K., Yeung, D.Y., Shi, B.E., Savarese, S.: Lattice long short-term memory for human action recognition. In: IEEE ICCV, pp. 2147–2156 (2017)
Google Scholar
Sun, Z., Ke, Q., Rahmani, H., Bennamoun, M., Wang, G., Liu, J.: Human action recognition from various data modalities: a review. In: IEEE PAMI, pp. 1–20 (2022)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: IEEE ICCV, pp. 4489–4497 (2015)
Google Scholar
Tu, Z., et al.: Multi-stream cnn: Learning representations based on human-related regions for action recognition. PR 79, 32–43 (2018)
Google Scholar
Vignolo, A., Noceti, N., Rea, F., Sciutti, A., Odone, F., Sandini, G.: Detecting biological motion for human-robot interaction: a link between perception and action. Front. Robotics AI, 14 (2017)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE ICCV, pp. 3551–3558 (2013)
Google Scholar
Yao, R., Lin, G., Xia, S., Zhao, J., Zhou, Y.: Video object segmentation and tracking: A survey. ACM TIST 11(4), 1–47 (2020)
Article Google Scholar
Zhang, Z., Hu, Y., Chan, S., Chia, L.-T.: Motion context: a new representation for human action recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 817–829. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_60
Chapter Google Scholar

Download references

Acknowledgments

This work has been supported by AFOSR with the grant n. FA8655-20-1-7035. VPP was supported by FSE REACT-EU-PON 2014-2020, DM 1062/2021.

Author information

Authors and Affiliations

MaLGa - DIBRIS, University of Genoa, Genoa, Italy
Elena Nicora, Vito Paolo Pastore & Nicoletta Noceti

Authors

Elena Nicora
View author publications
You can also search for this author in PubMed Google Scholar
Vito Paolo Pastore
View author publications
You can also search for this author in PubMed Google Scholar
Nicoletta Noceti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vito Paolo Pastore .

Editor information

Editors and Affiliations

University of Udine, Udine, Italy
Gian Luca Foresti
University of Udine, Udine, Italy
Andrea Fusiello
University of York, York, UK
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nicora, E., Pastore, V.P., Noceti, N. (2023). GCK-Maps: A Scene Unbiased Representation for Efficient Human Action Recognition. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14233. Springer, Cham. https://doi.org/10.1007/978-3-031-43148-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-43148-7_6
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43147-0
Online ISBN: 978-3-031-43148-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GCK-Maps: A Scene Unbiased Representation for Efficient Human Action Recognition