Skip to main content

Spatiotemporal Features Learning with 3DPyraNet

  • Conference paper
  • First Online:
Advanced Concepts for Intelligent Vision Systems (ACIVS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10016))

Abstract

A discriminative approach based on the 3DPyraNet model for spatiotemporal feature learning is proposed. In combination with a linear SVM classifier, our model outperform state-of-the-art methods on two datasets (KTH, Weizmann). Whereas, shows comparable result with current best methods on third dataset (YUPENN). The features are compact, achieving \(94.08\,\%\), \(99.13\,\%\), and 94.67 % accuracy on KTH, Weizmann, and YUPENN, respectively. The proposed model appears more suitable for spatiotemporal feature learning compared to traditional feature learning techniques; also, the number of parameters is far less than other 3DConvNets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)

    Google Scholar 

  2. Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatiotemporal features for action recognition. In: BMVC 2009 - British Machine Vision Conference, pp. 124.1–124.11 (2009)

    Google Scholar 

  3. Wang, H., Kläser, A., Schmid, C., Cheng-Lin, L.: Action recognition by dense trajectories. In: CVPR 2011 - IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, United States, pp. 3169–3176. IEEE, June 2011

    Google Scholar 

  4. Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings - International Conference on Pattern Recognition, vol. 3, pp. 32–36 (2004)

    Google Scholar 

  5. Derpanis, K.G., Lecce, M., Daniilidis, K., Wildes, R.P.: Dynamic scene understanding: the role of orientation features in space and time in scene classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1306–1313 (2012)

    Google Scholar 

  6. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012)

    Google Scholar 

  7. Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatiotemporal features for action recognition with independent subspace analysis. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3361–3368 (2011)

    Google Scholar 

  8. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 1725–1732. IEEE, June 2015

    Google Scholar 

  9. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the ACM International Conference on Multimedia (MM 2007), pp. 357–360 (2007)

    Google Scholar 

  10. Klaser, A., Marszalek, M., Schmid, C.: A spatiotemporal descriptor based on 3D-gradients. In: Proceedings of the British Machine Conference, pp. 99.1–99.10 (2008)

    Google Scholar 

  11. Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatiotemporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88688-4_48

    Chapter  Google Scholar 

  12. Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: IEEE 12th International Conference on Computer Vision, pp. 492–497, September 2009

    Google Scholar 

  13. Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatiotemporal features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15567-3_11

    Chapter  Google Scholar 

  14. Freitas, N.D.: Deep learning of invariant spatiotemporal features from video. In: Workshop on Deep Learning and Unsupervised Feature Learning in NIPS, pp. 1–9 (2010)

    Google Scholar 

  15. Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25446-8_4

    Google Scholar 

  16. Ullah, I., Petrosino, A.: A strict pyramidal deep neural network for action recognition. In: Murino, V., Puppo, E. (eds.) ICIAP 2015. LNCS, vol. 9279, pp. 236–245. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  17. Ji, S., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  18. Simonyan, K., Zisserman, A.: Two-Stream Convolutional Networks for Action Recognition in Videos. arXiv preprint arXiv:1406.2199, pp. 1–11, June 2014

  19. Uetz, R., Behnke, S.: Locally-connected hierarchical neural networks for gpu-accelerated object recognition. In: NIPS: Workshop on Large-Scale Machine Learning: Parallelism and Massive Datasets, Whistler, Canada, pp. 10–13, December 2009

    Google Scholar 

  20. Cantoni, V., Petrosino, A.: Neural recognition in a pyramidal structure. IEEE Trans. Neural Netw. 13(2), 472–480 (2002)

    Article  Google Scholar 

  21. Phung, S.L., Bouzerdoum, A.: A pyramidal neural network for visual pattern recognition. IEEE Trans. Neural Netw. Publ. IEEE Neural Netw. Counc. 18(2), 329–343 (2007)

    Article  Google Scholar 

  22. Maddalena, L., Petrosino, A.: The 3dsobs+ algorithm for moving object detection. Comput. Vis. Image Underst. 122, 65–73 (2014)

    Article  Google Scholar 

  23. Karpathy, A., Leung, T.: Large-scale video classification with convolutional neural networks. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

    Google Scholar 

  24. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision (ICCV 2005), vol. 1, pp. 1395–1402 (2005). Vol. 2

    Google Scholar 

  25. MATLAB: Matlab version 8.4.0.150421 (R2014b). The MathWorks Inc., Natick, Massachusetts (2014)

    Google Scholar 

  26. Maninis, K., Koutras, P., Maragos, P.: Advances on action recognition in videos using an interest point detector based on multiband spatiotemporal energies. In: 2014 IEEE International Conference on Image Processing, ICIP 2014, Paris, France, October 27–30, 2014, pp. 1490–1494 (2014)

    Google Scholar 

  27. Chen, B., Ting, J.A., Marlin, B., de Freitas, N.: Deep learning of invariant spatiotemporal features from video. In: NIPS 2010 Deep Learning and Unsupervised Feature Learning Workshop (2010)

    Google Scholar 

  28. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatiotemporal features. In: Proceedings - 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, VS-PETS 2005, pp. 65–72 (2005)

    Google Scholar 

  29. Weinland, D., Özuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 635–648. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15558-1_46

    Chapter  Google Scholar 

  30. Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spacetime forests with complementary features for dynamic scene recognition. In: BMVC (2013)

    Google Scholar 

  31. Theriault, C., Thome, N., Cord, M.: Dynamic scene classification: Learning motion descriptors with slow features analysis. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2603–2610, June 2013

    Google Scholar 

  32. Feichtenhofer, C., Pinz, A., Wildes, R.: Bags of spacetime energies for dynamic scene recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2681–2688 (2014)

    Google Scholar 

  33. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  34. Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626 (2015)

    Google Scholar 

  35. Lin, M., Chen, Q., Yan, S.: Network in network. CoRR abs/1312.4400 (2013)

    Google Scholar 

  36. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, USA, June 7–12, pp. 1–9 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ihsan Ullah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ullah, I., Petrosino, A. (2016). Spatiotemporal Features Learning with 3DPyraNet. In: Blanc-Talon, J., Distante, C., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2016. Lecture Notes in Computer Science(), vol 10016. Springer, Cham. https://doi.org/10.1007/978-3-319-48680-2_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48680-2_56

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48679-6

  • Online ISBN: 978-3-319-48680-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics