Spatiotemporal Features Learning with 3DPyraNet

Ullah, Ihsan; Petrosino, Alfredo

doi:10.1007/978-3-319-48680-2_56

Ihsan Ullah^18,19 &
Alfredo Petrosino¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10016))

Included in the following conference series:

International Conference on Advanced Concepts for Intelligent Vision Systems

2289 Accesses
3 Citations

Abstract

A discriminative approach based on the 3DPyraNet model for spatiotemporal feature learning is proposed. In combination with a linear SVM classifier, our model outperform state-of-the-art methods on two datasets (KTH, Weizmann). Whereas, shows comparable result with current best methods on third dataset (YUPENN). The features are compact, achieving \(94.08\,\%\), \(99.13\,\%\), and 94.67 % accuracy on KTH, Weizmann, and YUPENN, respectively. The proposed model appears more suitable for spatiotemporal feature learning compared to traditional feature learning techniques; also, the number of parameters is far less than other 3DConvNets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)
Google Scholar
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatiotemporal features for action recognition. In: BMVC 2009 - British Machine Vision Conference, pp. 124.1–124.11 (2009)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Cheng-Lin, L.: Action recognition by dense trajectories. In: CVPR 2011 - IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, United States, pp. 3169–3176. IEEE, June 2011
Google Scholar
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings - International Conference on Pattern Recognition, vol. 3, pp. 32–36 (2004)
Google Scholar
Derpanis, K.G., Lecce, M., Daniilidis, K., Wildes, R.P.: Dynamic scene understanding: the role of orientation features in space and time in scene classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1306–1313 (2012)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012)
Google Scholar
Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatiotemporal features for action recognition with independent subspace analysis. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3361–3368 (2011)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 1725–1732. IEEE, June 2015
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the ACM International Conference on Multimedia (MM 2007), pp. 357–360 (2007)
Google Scholar
Klaser, A., Marszalek, M., Schmid, C.: A spatiotemporal descriptor based on 3D-gradients. In: Proceedings of the British Machine Conference, pp. 99.1–99.10 (2008)
Google Scholar
Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatiotemporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88688-4_48
Chapter Google Scholar
Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: IEEE 12th International Conference on Computer Vision, pp. 492–497, September 2009
Google Scholar
Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatiotemporal features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15567-3_11
Chapter Google Scholar
Freitas, N.D.: Deep learning of invariant spatiotemporal features from video. In: Workshop on Deep Learning and Unsupervised Feature Learning in NIPS, pp. 1–9 (2010)
Google Scholar
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25446-8_4
Google Scholar
Ullah, I., Petrosino, A.: A strict pyramidal deep neural network for action recognition. In: Murino, V., Puppo, E. (eds.) ICIAP 2015. LNCS, vol. 9279, pp. 236–245. Springer, Heidelberg (2015)
Chapter Google Scholar
Ji, S., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Simonyan, K., Zisserman, A.: Two-Stream Convolutional Networks for Action Recognition in Videos. arXiv preprint arXiv:1406.2199, pp. 1–11, June 2014
Uetz, R., Behnke, S.: Locally-connected hierarchical neural networks for gpu-accelerated object recognition. In: NIPS: Workshop on Large-Scale Machine Learning: Parallelism and Massive Datasets, Whistler, Canada, pp. 10–13, December 2009
Google Scholar
Cantoni, V., Petrosino, A.: Neural recognition in a pyramidal structure. IEEE Trans. Neural Netw. 13(2), 472–480 (2002)
Article Google Scholar
Phung, S.L., Bouzerdoum, A.: A pyramidal neural network for visual pattern recognition. IEEE Trans. Neural Netw. Publ. IEEE Neural Netw. Counc. 18(2), 329–343 (2007)
Article Google Scholar
Maddalena, L., Petrosino, A.: The 3dsobs+ algorithm for moving object detection. Comput. Vis. Image Underst. 122, 65–73 (2014)
Article Google Scholar
Karpathy, A., Leung, T.: Large-scale video classification with convolutional neural networks. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision (ICCV 2005), vol. 1, pp. 1395–1402 (2005). Vol. 2
Google Scholar
MATLAB: Matlab version 8.4.0.150421 (R2014b). The MathWorks Inc., Natick, Massachusetts (2014)
Google Scholar
Maninis, K., Koutras, P., Maragos, P.: Advances on action recognition in videos using an interest point detector based on multiband spatiotemporal energies. In: 2014 IEEE International Conference on Image Processing, ICIP 2014, Paris, France, October 27–30, 2014, pp. 1490–1494 (2014)
Google Scholar
Chen, B., Ting, J.A., Marlin, B., de Freitas, N.: Deep learning of invariant spatiotemporal features from video. In: NIPS 2010 Deep Learning and Unsupervised Feature Learning Workshop (2010)
Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatiotemporal features. In: Proceedings - 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, VS-PETS 2005, pp. 65–72 (2005)
Google Scholar
Weinland, D., Özuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 635–648. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15558-1_46
Chapter Google Scholar
Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spacetime forests with complementary features for dynamic scene recognition. In: BMVC (2013)
Google Scholar
Theriault, C., Thome, N., Cord, M.: Dynamic scene classification: Learning motion descriptors with slow features analysis. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2603–2610, June 2013
Google Scholar
Feichtenhofer, C., Pinz, A., Wildes, R.: Bags of spacetime energies for dynamic scene recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2681–2688 (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626 (2015)
Google Scholar
Lin, M., Chen, Q., Yan, S.: Network in network. CoRR abs/1312.4400 (2013)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, USA, June 7–12, pp. 1–9 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

CVPR Lab, University of Napoli Parthenope, Napoli, Italy
Ihsan Ullah & Alfredo Petrosino
Department of Computer Science, University of Milan, Milan, Italy
Ihsan Ullah

Authors

Ihsan Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Petrosino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ihsan Ullah .

Editor information

Editors and Affiliations

Université Paris-Sud 11 , Orsay, France
Jacques Blanc-Talon
University of Salento , Lecce, Lecce, Italy
Cosimo Distante
Ghent University , Gent, Belgium
Wilfried Philips
CSIRO ICT Centre , Sydney, New South Wales, Australia
Dan Popescu
University of Antwerp , Wilrijk, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ullah, I., Petrosino, A. (2016). Spatiotemporal Features Learning with 3DPyraNet. In: Blanc-Talon, J., Distante, C., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2016. Lecture Notes in Computer Science(), vol 10016. Springer, Cham. https://doi.org/10.1007/978-3-319-48680-2_56

Download citation

DOI: https://doi.org/10.1007/978-3-319-48680-2_56
Published: 21 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48679-6
Online ISBN: 978-3-319-48680-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics