Abstract
A discriminative approach based on the 3DPyraNet model for spatiotemporal feature learning is proposed. In combination with a linear SVM classifier, our model outperform state-of-the-art methods on two datasets (KTH, Weizmann). Whereas, shows comparable result with current best methods on third dataset (YUPENN). The features are compact, achieving \(94.08\,\%\), \(99.13\,\%\), and 94.67 % accuracy on KTH, Weizmann, and YUPENN, respectively. The proposed model appears more suitable for spatiotemporal feature learning compared to traditional feature learning techniques; also, the number of parameters is far less than other 3DConvNets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatiotemporal features for action recognition. In: BMVC 2009 - British Machine Vision Conference, pp. 124.1–124.11 (2009)
Wang, H., Kläser, A., Schmid, C., Cheng-Lin, L.: Action recognition by dense trajectories. In: CVPR 2011 - IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, United States, pp. 3169–3176. IEEE, June 2011
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings - International Conference on Pattern Recognition, vol. 3, pp. 32–36 (2004)
Derpanis, K.G., Lecce, M., Daniilidis, K., Wildes, R.P.: Dynamic scene understanding: the role of orientation features in space and time in scene classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1306–1313 (2012)
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012)
Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatiotemporal features for action recognition with independent subspace analysis. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3361–3368 (2011)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 1725–1732. IEEE, June 2015
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the ACM International Conference on Multimedia (MM 2007), pp. 357–360 (2007)
Klaser, A., Marszalek, M., Schmid, C.: A spatiotemporal descriptor based on 3D-gradients. In: Proceedings of the British Machine Conference, pp. 99.1–99.10 (2008)
Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatiotemporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88688-4_48
Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: IEEE 12th International Conference on Computer Vision, pp. 492–497, September 2009
Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatiotemporal features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15567-3_11
Freitas, N.D.: Deep learning of invariant spatiotemporal features from video. In: Workshop on Deep Learning and Unsupervised Feature Learning in NIPS, pp. 1–9 (2010)
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25446-8_4
Ullah, I., Petrosino, A.: A strict pyramidal deep neural network for action recognition. In: Murino, V., Puppo, E. (eds.) ICIAP 2015. LNCS, vol. 9279, pp. 236–245. Springer, Heidelberg (2015)
Ji, S., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Simonyan, K., Zisserman, A.: Two-Stream Convolutional Networks for Action Recognition in Videos. arXiv preprint arXiv:1406.2199, pp. 1–11, June 2014
Uetz, R., Behnke, S.: Locally-connected hierarchical neural networks for gpu-accelerated object recognition. In: NIPS: Workshop on Large-Scale Machine Learning: Parallelism and Massive Datasets, Whistler, Canada, pp. 10–13, December 2009
Cantoni, V., Petrosino, A.: Neural recognition in a pyramidal structure. IEEE Trans. Neural Netw. 13(2), 472–480 (2002)
Phung, S.L., Bouzerdoum, A.: A pyramidal neural network for visual pattern recognition. IEEE Trans. Neural Netw. Publ. IEEE Neural Netw. Counc. 18(2), 329–343 (2007)
Maddalena, L., Petrosino, A.: The 3dsobs+ algorithm for moving object detection. Comput. Vis. Image Underst. 122, 65–73 (2014)
Karpathy, A., Leung, T.: Large-scale video classification with convolutional neural networks. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision (ICCV 2005), vol. 1, pp. 1395–1402 (2005). Vol. 2
MATLAB: Matlab version 8.4.0.150421 (R2014b). The MathWorks Inc., Natick, Massachusetts (2014)
Maninis, K., Koutras, P., Maragos, P.: Advances on action recognition in videos using an interest point detector based on multiband spatiotemporal energies. In: 2014 IEEE International Conference on Image Processing, ICIP 2014, Paris, France, October 27–30, 2014, pp. 1490–1494 (2014)
Chen, B., Ting, J.A., Marlin, B., de Freitas, N.: Deep learning of invariant spatiotemporal features from video. In: NIPS 2010 Deep Learning and Unsupervised Feature Learning Workshop (2010)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatiotemporal features. In: Proceedings - 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, VS-PETS 2005, pp. 65–72 (2005)
Weinland, D., Özuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 635–648. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15558-1_46
Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spacetime forests with complementary features for dynamic scene recognition. In: BMVC (2013)
Theriault, C., Thome, N., Cord, M.: Dynamic scene classification: Learning motion descriptors with slow features analysis. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2603–2610, June 2013
Feichtenhofer, C., Pinz, A., Wildes, R.: Bags of spacetime energies for dynamic scene recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2681–2688 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626 (2015)
Lin, M., Chen, Q., Yan, S.: Network in network. CoRR abs/1312.4400 (2013)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, USA, June 7–12, pp. 1–9 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Ullah, I., Petrosino, A. (2016). Spatiotemporal Features Learning with 3DPyraNet. In: Blanc-Talon, J., Distante, C., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2016. Lecture Notes in Computer Science(), vol 10016. Springer, Cham. https://doi.org/10.1007/978-3-319-48680-2_56
Download citation
DOI: https://doi.org/10.1007/978-3-319-48680-2_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48679-6
Online ISBN: 978-3-319-48680-2
eBook Packages: Computer ScienceComputer Science (R0)