Global Contrast Based Salient Region Boundary Sampling for Action Recognition

Xu, Zengmin; Hu, Ruimin; Chen, Jun; Chen, Huafeng; Li, Hongyang

doi:10.1007/978-3-319-27671-7_16

Zengmin Xu^19,21,
Ruimin Hu^19,20,
Jun Chen^19,20,
Huafeng Chen¹⁹ &
…
Hongyang Li¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

International Conference on Multimedia Modeling

3038 Accesses
6 Citations

Abstract

Although the excellent representation ability of improved Dense Trajectory (iDT) based features for action video had been proved on several action datasets, the performance of action recognition still suffers from large camera motion of videos. In this paper, we improve the iDT method by advancing a novel salient region boundary based dense sampling strategy, which reduces the number of trajectories while preserves the discriminative power. We first implement the iDT sampling based on motion boundary image, then introduce a global contrast based salient object segmentation method in interest points sampling step of action recognition. To overcome the flaws of global color contrast-based salient region sampling, we apply morphological gradient to generate a more robust mask for sampling dense points, as motion boundaries are much clearer. To evaluate the proposed method, we conduct extensive experiments on two benchmarks including HMDB51 and UCF50. The results show that our sampling strategy can improve the performance of action recognition with minor computational cost of mask production. In particular, on the HMDB51 dataset, the improvement over the original iDT result is 3 %. Meanwhile, any other dense features of action recognition can achieve more competitive performance by utilizing our sampling strategy and Fisher vector encoding method simply.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR 2004, vol. 3, pp. 32–36 (2004)
Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Article Google Scholar
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR 2008, pp. 1–8 (2008)
Google Scholar
Liu, J.G., Luo, J.B., Shah, M.: Recognizing realistic actions from videos in the wild. In: CVPR 2009, pp. 1996–2003 (2009)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video data-base for human motion recognition. In: ICCV 2011, pp. 2556–2563 (2011)
Google Scholar
Reddy, K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)
Article Google Scholar
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–203 (2005)
Article MathSciNet Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: PETS 2005, pp. 65–72 (2005)
Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008 (2008)
Google Scholar
Sun, J., Wu, X., Yan, S.C., Cheong, L.F., Chua, T.S., Li., J.T.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR 2009, pp. 2004–2011 (2009)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV 2013, pp. 3551–3558 (2013)
Google Scholar
Peng, X.J., Qiao, Y., Peng, Q.: Motion boundary based sampling and 3D co-occurrence descriptors for action recognition. Image Vis. Comput. 32(9), 616–628 (2014)
Article Google Scholar
Vig, E., Dorr, M., Cox, D.: Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 84–97. Springer, Heidelberg (2012)
Chapter Google Scholar
Wang, B., Liu, Y., Xiao, W.H., Xiong, Z.H., Wang, W., Zhang, M.J.: Human action recognition with optimized video densely sampling. In: ICME 2013, pp. 1–6 (2013)
Google Scholar
Shi, F., Petriu, E., Laganiere, R.: Sampling strategies for real-time action recognition. In: CVPR 2013, pp. 2595–2602 (2013)
Google Scholar
Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR 2013, pp. 2555–2562 (2013)
Google Scholar
Jiang, Y.-G., Dai, Q., Xue, X., Liu, W., Ngo, C.-W.: Trajectory-based modeling of human actions with motion reference points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 425–438. Springer, Heidelberg (2012)
Chapter Google Scholar
Ballas, N., Yang, Y., Lan, Z.Z., Delezoide, B., Preteux, F., Hauptmann, A.: Space-time robust video representation for action recognition. In: ICCV 2013, pp. 2704–2711 (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS 2014 (2014)
Google Scholar
Cheng, M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2015)
Article Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Chapter Google Scholar
Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV 2013, pp. 3192–3199 (2013)
Google Scholar
Grossberg, S., Mingolla, E.: Neural dynamics of motion perception: direction fields, apertures, and resonant grouping. Percept. Psychophysics 53(3), 243–278 (1993)
Article Google Scholar
Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 581–595. Springer, Heidelberg (2014)
Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Tech. 2(3), 27 (2011)
Article Google Scholar
Wang, L.M., Qiao, Y., Tang, X.O.: Mining motion atoms and phrases for complex action recognition. In: ICCV 2013, pp. 2680–2687 (2013)
Google Scholar

Download references

Acknowledgement

The research was supported by the National Nature Science Foundation of China (61231015, 61170023, 61367002), the National High Technology Research and Development Program of China (863 Program) (2015AA016306, 2013AA014602), the Internet of Things Development Funding Project of Ministry of industry in 2013(25), the Technology Research Program of Ministry of Public Security (2014JSYJA016), the Major Science and Technology Innovation Plan of Hubei Province (2013AAA020), the Nature Science Foundation of Hubei Province (2014CFB712).

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University, Wuhan, China
Zengmin Xu, Ruimin Hu, Jun Chen, Huafeng Chen & Hongyang Li
Collaborative Innovation Center of Geospatial Technology, Wuhan, China
Ruimin Hu & Jun Chen
School of Mathematics and Computing Science, Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation, Guilin University of Electronic Technology, Guilin, China
Zengmin Xu

Authors

Zengmin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Huafeng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hongyang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruimin Hu .

Editor information

Editors and Affiliations

University of Texas at San Antonio, San Antonio, USA
Qi Tian
Dept. of Information Engineering, University of Trento, Povo, Trento, Italy
Nicu Sebe
EECS, University of Central Florida, Orlando, Florida, USA
Guo-Jun Qi
EURECOM, Sophia-Antipolis, France
Benoit Huet
Hefei University of Technology, Hefei, Anhui, China
Richang Hong
School of Computing and Information, Hefei University of Technology, Hefei, Anhui, China
Xueliang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Z., Hu, R., Chen, J., Chen, H., Li, H. (2016). Global Contrast Based Salient Region Boundary Sampling for Action Recognition. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-27671-7_16
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics