Skip to main content

A Dense SURF and Triangulation Based Spatio-temporal Feature for Action Recognition

  • Conference paper
MultiMedia Modeling (MMM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8325))

Included in the following conference series:

  • 3382 Accesses

Abstract

In this paper, we propose a novel method of extracting spatio-temporal features from videos. Given a video, we extract its features according to every set of N frames. The value of N is small enough to guarantee the temporal denseness of our features. For each frame set, we first extract dense SURF keypoints from its first frame. We then select points with the most likely dominant and reliable movements, and consider them as interest points. In the next step, we form triangles of interest points using Delaunay triangulation and track points within each triple through the frame set. We extract one spatio-temporal feature from each triangle based on its shape feature along with the visual features and optical flows of its points. This enables us to extract spatio-temporal features based on groups of related points and their trajectories. Hence the features can be expected to be robust and informative. We apply Fisher Vector encoding to represent videos using the proposed spatio-temporal features. We conduct experiments on several challenging benchmarks, and show the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: Proc. of IEEE Computer Vision and Pattern Recognition, pp. 1932–1939 (2009)

    Google Scholar 

  2. Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision 103(1), 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  3. Dollar, P., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proc. of Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)

    Google Scholar 

  4. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. of IEEE Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  5. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proc. of ACM International Conference Multimedia, pp. 357–360 (2007)

    Google Scholar 

  6. Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: Proc. of British Machine Vision Conference, pp. 995–1004 (2008)

    Google Scholar 

  7. Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: Proc. of IEEE International Conference on Computer Vision, pp. 492–497 (2009)

    Google Scholar 

  8. Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proc. of IEEE Computer Vision and Pattern Recognition (2013)

    Google Scholar 

  9. Brox, T., Bregler, C., Malik, J.: Large displacement optical flow. In: Proc. of IEEE Computer Vision and Pattern Recognition, pp. 41–48 (2009)

    Google Scholar 

  10. Jensen, F.V., Christensen, H.I., Nielsen, J.: Bayesian methods for interpretation and control in multi-agent vision systems. In: Proc. of SPIE 1708, Applications of Artificial Intelligence X: Machine Vision and Robotics, pp. 536–548 (1994)

    Google Scholar 

  11. Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Noguchi, A., Yanai, K.: A surf-based spatio-temporal feature for feature-fusion-based action recognition. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part I. LNCS, vol. 6553, pp. 153–167. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proc. of IEEE Computer Vision and Pattern Recognition, pp. 1–8 (2007)

    Google Scholar 

  15. Atmosukarto, I., Ghanem, B., Ahuja, N.: Trajectory-based fisher kernel representation for action recognition in videos. In: Proc. of IAPR International Conference on Pattern Recognition, pp. 3333–3336 (2012)

    Google Scholar 

  16. Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: Proc. of IEEE International Conference on Computer Vision (2003)

    Google Scholar 

  17. Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action recognition through the motion analysis of tracked features. In: ICCV Workshop on Video-Oriented Object and Event Classification (2009)

    Google Scholar 

  18. Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 494–507. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  19. Shandong, W., Omar, O., Mubarak, S.: Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: Proc. of IEEE International Conference on Computer Vision, pp. 1419–1426 (2011)

    Google Scholar 

  20. Uijlings, J.R.R., Smeulders, A.W.M., Scha, R.J.H.: Real-time visual concept classification. IEEE Transactions on Multimedia (2010)

    Google Scholar 

  21. Jhuang, H.A., Garrote, H.A., Poggio, E.A., Serre, T.A., Hmdb, T.: A large video database for human motion recognition. In: Proc. of IEEE International Conference on Computer Vision (2011)

    Google Scholar 

  22. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. The Journal of Machine Learning Research 6, 1453–1484 (2005)

    MATH  MathSciNet  Google Scholar 

  23. Reddy, K.K., Shar, M.: Recognizing 50 human action categories of web videos. Machine Vision and Applications 24, 971–981

    Google Scholar 

  24. Laptev, I., Marszalek, A., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. of IEEE Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  25. Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: Proc. of IEEE Computer Vision and Pattern Recognition (2012)

    Google Scholar 

  26. Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 256–269. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  27. Khurram, S., Amir, R.Z., Shar, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR, abs/1212.0402 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Nga, D.H., Yanai, K. (2014). A Dense SURF and Triangulation Based Spatio-temporal Feature for Action Recognition. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds) MultiMedia Modeling. MMM 2014. Lecture Notes in Computer Science, vol 8325. Springer, Cham. https://doi.org/10.1007/978-3-319-04114-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04114-8_32

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04113-1

  • Online ISBN: 978-3-319-04114-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics