Human Action Recognition Using Convolutional Neural Networks with Symmetric Time Extension of Visual Rhythms

Tacon, Hemerson; Brito, André S.; Chaves, Hugo L.; Vieira, Marcelo Bernardes; Villela, Saulo Moraes; de Almeida Maia, Helena; Concha, Darwin Ttito; Pedrini, Helio

doi:10.1007/978-3-030-24289-3_26

Human Action Recognition Using Convolutional Neural Networks with Symmetric Time Extension of Visual Rhythms

Hemerson Tacon²⁴,
André S. Brito²⁴,
Hugo L. Chaves²⁴,
Marcelo Bernardes Vieira²⁴,
Saulo Moraes Villela²⁴,
Helena de Almeida Maia²⁵,
Darwin Ttito Concha²⁵ &
…
Helio Pedrini²⁵

Conference paper
First Online: 29 June 2019

1396 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11619))

Abstract

Despite the expressive progress of deep learning models on the image classification task, they still need enhancement for efficient human action recognition. One way to achieve such gain is to augment the existing datasets. With this goal, we propose the usage of multiple Visual Rhythm crops, symmetrically extended in time and separated by a fixed stride. The symmetric extension preserves the video frame rate, which is crucial to not distort actions. The crops provide a 2D representation of the video volume matching the fixed input size of the 2D Convolutional Neural Network (CNN) employed. In addition, multiple crops with stride guarantee coverage of the entire video. Aiming to evaluate our method, a multi-stream strategy combining RGB and Optical Flow information is extended to include the Visual Rhythm. Accuracy rates fairly close to the state-of-the-art were obtained from the experiments with our method on the challenging UCF101 and HMDB51 datasets.

All authors thank CAPES, FAPEMIG (grant CEX-APQ-01744-15), FAPESP (grants #2017/09160-1 and #2017/12646-3), CNPq (grant #305169/2015-7) for the financial support, and NVIDIA for the grant of two GPUs (GPU Grant Program).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ciptadi, A., Goodwin, M.S., Rehg, J.M.: Movement pattern histogram for action recognition and retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 695–710. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_45
Chapter Google Scholar
Ji, S., Wei, X., Yang, M., Kai, Y.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Kong, Y., Fu, Y.: Human action recognition and prediction: a survey. arXiv preprint arXiv:1806.11230 (2018)
Carreira, J., Zisserman, A., Vadis, Q.: Action recognition? A new model and the kinetics dataset. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4733. IEEE (2017)
Google Scholar
Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., Gould, S.: Dynamic image networks for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3034–3042 (2016)
Google Scholar
Wang, J., Cherian, A., Porikli, F., Gould, S.: Video representation learning using discriminative pooling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1149–1158 (2018)
Google Scholar
Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal multiplier networks for video action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7445–7454. IEEE (2017)
Google Scholar
Choutas, V., Weinzaepfel, P., Revaud, J., Schmid, C.: PoTion: pose motion representation for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Abu-El-Haija, S., et al.: Youtube-8M: a large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 (2016)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Kuehne, H., Jhuang, H., Stiefelhagen, R., Serre, T.: HMDB51 a large video database for human motion recognition. In: Nagel, W., Kröner, D., Resch, M. (eds.) High Performance Computing in Science and Engineering, pp. 571–582. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-33374-3_41
Chapter Google Scholar
Ngo, C.-W., Pong, T.-C., Chin, R.T.: Camera break detection by partitioning of 2D spatio-temporal images in MPEG domain. In: IEEE International Conference on Multimedia Computing and Systems, vol. 1, pp. 750–755. IEEE (1999)
Google Scholar
Ngo, C.-W., Pong, T.-C., Chin, R.T.: Detection of gradual transitions through temporal slice analysis. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 36–41. IEEE (1999)
Google Scholar
Souza, M.R.: Digital video stabilization: algorithms and evaluation. Master’s thesis, Institute of Computing, University of Campinas, Campinas, Brazil (2018)
Google Scholar
Concha, D.T., Maia, H.A., Pedrini, H., Tacon, H., Brito, A.S., Chaves, H.L., Vieira, M.B.: Multi-stream convolutional neural networks for action recognition in video sequences based on adaptive visual rhythms. In: IEEE International Conference on Machine Learning and Applications. IEEE (2018)
Google Scholar
Kim, H., Lee, J., Yang, J.-H., Sull, S., Kim, W.M., Moon-Ho Song, S.: Visual rhythm and shot verification. Multimedia Tools Appl. 15(3), 227–245 (2001)
Article Google Scholar
Torres, B.S., Pedrini, H.: Detection of complex video events through visual rhythm. Vis. Comput., 1–21 (2016)
Google Scholar
Zhu, J., Zhu, Z., Zou, W.: End-to-end video-level representation learning for action recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 645–650. IEEE (2018)
Google Scholar
Ng, J.Y.-H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702 (2015)
Google Scholar
Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y.: A key volume mining deep framework for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1991–1999. IEEE (2016)
Google Scholar
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Gool, L.V.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Chollet, F., et al.: Keras (2015). https://keras.io
Wang, L., Xiong, Y., Wang, Z., Qiao, Y.: Towards good practices for very deep two-stream convnets. arXiv preprint arXiv:1507.02159 (2015)
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, R., Suleyman, M., Zisserman, A.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput. Vis. Image Underst. 150, 109–125 (2016)
Article Google Scholar
Wang, H., Yang, Y., Yang, E., Deng, C.: Exploring hybrid spatio-temporal convolutional networks for human action recognition. Multimedia Tools Appl. 76(13), 15065–15081 (2017)
Article Google Scholar
Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314 (2015)
Google Scholar
Varol, G., Laptev, I., Schmid, C.: Long-term temporal convolutions for action recognition. arXiv preprint arXiv:1604.04494 (2016)
Wang, Y., Long, M., Wang, J., Yu, P.S.: Spatiotemporal pyramid network for video action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106. IEEE (2017)
Google Scholar
Sun, L., Jia, K., Chen, K., Yeung, D.Y., Shi, B.E., Savarese, S.: Lattice long short-term memory for human action recognition. arXiv preprint arXiv:1708.03958 (2017)

Download references

Author information

Authors and Affiliations

Department of Computer Science, Universidade Federal de Juiz de Fora, Juiz de Fora, Brazil
Hemerson Tacon, André S. Brito, Hugo L. Chaves, Marcelo Bernardes Vieira & Saulo Moraes Villela
Institute of Computing, University of Campinas, Campinas, Brazil
Helena de Almeida Maia, Darwin Ttito Concha & Helio Pedrini

Authors

Hemerson Tacon
View author publications
You can also search for this author in PubMed Google Scholar
André S. Brito
View author publications
You can also search for this author in PubMed Google Scholar
Hugo L. Chaves
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Bernardes Vieira
View author publications
You can also search for this author in PubMed Google Scholar
Saulo Moraes Villela
View author publications
You can also search for this author in PubMed Google Scholar
Helena de Almeida Maia
View author publications
You can also search for this author in PubMed Google Scholar
Darwin Ttito Concha
View author publications
You can also search for this author in PubMed Google Scholar
Helio Pedrini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcelo Bernardes Vieira .

Editor information

Editors and Affiliations

Covenant University, Ota, Nigeria
Sanjay Misra
University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
Saint Petersburg State University, Saint Petersburg, Russia
Vladimir Korkhov
Polytechnic University of Bari, Bari, Italy
Carmelo Torre
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Monash University, Clayton, VIC, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
Polytechnic University of Bari, Bari, Italy
Eufemia Tarantino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tacon, H. et al. (2019). Human Action Recognition Using Convolutional Neural Networks with Symmetric Time Extension of Visual Rhythms. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11619. Springer, Cham. https://doi.org/10.1007/978-3-030-24289-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-24289-3_26
Published: 29 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24288-6
Online ISBN: 978-3-030-24289-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics