Skip to main content

Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking

  • Conference paper
  • First Online:
Computer Vision – ACCV 2016 (ACCV 2016)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10112))

Included in the following conference series:

Abstract

In this paper, we address the problem of searching action proposals in unconstrained video clips. Our approach starts from actionness estimation on frame-level bounding boxes, and then aggregates the bounding boxes belonging to the same actor across frames via linking, associating, tracking to generate spatial-temporal continuous action paths. To achieve the target, a novel actionness estimation method is firstly proposed by utilizing both human appearance and motion cues. Then, the association of the action paths is formulated as a maximum set coverage problem with the results of actionness estimation as a priori. To further promote the performance, we design an improved optimization objective for the problem and provide a greedy search algorithm to solve it. Finally, a tracking-by-detection scheme is designed to further refine the searched action paths. Extensive experiments on two challenging datasets, UCF-Sports and UCF-101, show that the proposed approach advances state-of-the-art proposal generation performance in terms of both accuracy and proposal quantity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)

    Google Scholar 

  2. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)

    Google Scholar 

  3. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)

    Google Scholar 

  4. Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. (IJCV) 104, 154–171 (2013)

    Article  Google Scholar 

  5. Ma, S., Zhang, J., Ikizler-Cinbis, N., Sclaroff, S.: Action recognition and localization by hierarchical space-time segments. In: ICCV, pp. 2744–2751 (2013)

    Google Scholar 

  6. Oneata, D., Revaud, J., Verbeek, J., Schmid, C.: Spatio-temporal object detection proposals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 737–752. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10578-9_48

    Google Scholar 

  7. Bergh, M., Roig, G., Boix, X., Manen, S., Gool, L.: Online video seeds for temporal window objectness. In: ICCV (2013)

    Google Scholar 

  8. Jain, M., Gemert, J., Jégou, H., Bouthemy, P., Snoek, C.: Action localization with tubelets from motion. In: CVPR (2014)

    Google Scholar 

  9. Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Learning to track for spatio-temporal action localization. In: ICCV (2015)

    Google Scholar 

  10. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. (IJCV) 103, 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  11. Van Gemert, J.C., Jain, M., Gati, E., Snoek, C.G.: APT: action localization proposals from dense trajectories. In: BMVC (2015)

    Google Scholar 

  12. Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35, 835–848 (2013)

    Article  Google Scholar 

  13. Kläser, A., Marszałek, M., Schmid, C., Zisserman, A.: Human focused action localization in video. In: Kutulakos, K.N. (ed.) ECCV 2010. LNCS, vol. 6553, pp. 219–233. Springer, Heidelberg (2012). doi:10.1007/978-3-642-35749-7_17

    Chapter  Google Scholar 

  14. Yu, G., Yuan, J.: Fast action proposals for human action detection and search. In: CVPR (2015)

    Google Scholar 

  15. Gkioxari, G., Malik, J.: Finding action tubes. In: CVPR (2015)

    Google Scholar 

  16. Tran, D., Yuan, J., Forsyth, D.: Video event detection: from subvolume localization to spatiotemporal path search. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 36, 404–416 (2014)

    Article  Google Scholar 

  17. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_26

    Google Scholar 

  18. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

    Google Scholar 

  19. Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions–I. Math. Program. 14, 265–294 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  20. Siva, P., Xiang, T.: Weakly supervised action detection. In: BMVC, vol. 2, p. 6 (2011)

    Google Scholar 

  21. Laptev, I., Pérez, P.: Retrieving actions in movies. In: ICCV (2007)

    Google Scholar 

  22. Gaidon, A., Harchaoui, Z., Schmid, C.: Temporal localization of actions with actoms. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35, 2782–2795 (2013)

    Article  Google Scholar 

  23. Wang, H., Oneata, D., Verbeek, J., Schmid, C.: A robust and efficient video representation for action recognition. Int. J. Comput. Vis. (IJCV) 119, 1–20 (2015)

    Google Scholar 

  24. Hare, S., Saffari, A., Torr, P.H.: Struck: structured output tracking with kernels. In: ICCV (2011)

    Google Scholar 

  25. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 34, 1409–1422 (2012)

    Article  Google Scholar 

  26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint: arXiv:1409.1556

  27. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  28. Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)

    Google Scholar 

  29. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. In: CoRR (2012)

    Google Scholar 

  30. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding (2014). arXiv preprint: arXiv:1408.5093

  31. Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: ECCV (2015)

    Google Scholar 

Download references

Acknowledgement

The work was partially supported by Shenzhen Peacock Plan (20130408-183003656), Science and Technology Planning Project of Guangdong Province, China (No. 2014B090910001) and China 863 project of 2015AA01 5905.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ge Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Li, N., Xu, D., Ying, Z., Li, Z., Li, G. (2017). Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54184-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54183-9

  • Online ISBN: 978-3-319-54184-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics