A Dense SURF and Triangulation Based Spatio-temporal Feature for Action Recognition

Nga, Do Hang; Yanai, Keiji

doi:10.1007/978-3-319-04114-8_32

Do Hang Nga²² &
Keiji Yanai²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8325))

Included in the following conference series:

International Conference on Multimedia Modeling

3382 Accesses

Abstract

In this paper, we propose a novel method of extracting spatio-temporal features from videos. Given a video, we extract its features according to every set of N frames. The value of N is small enough to guarantee the temporal denseness of our features. For each frame set, we first extract dense SURF keypoints from its first frame. We then select points with the most likely dominant and reliable movements, and consider them as interest points. In the next step, we form triangles of interest points using Delaunay triangulation and track points within each triple through the frame set. We extract one spatio-temporal feature from each triangle based on its shape feature along with the visual features and optical flows of its points. This enables us to extract spatio-temporal features based on groups of related points and their trajectories. Hence the features can be expected to be robust and informative. We apply Fisher Vector encoding to represent videos using the proposed spatio-temporal features. We conduct experiments on several challenging benchmarks, and show the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: Proc. of IEEE Computer Vision and Pattern Recognition, pp. 1932–1939 (2009)
Google Scholar
Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Dollar, P., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proc. of Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. of IEEE Computer Vision and Pattern Recognition (2008)
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proc. of ACM International Conference Multimedia, pp. 357–360 (2007)
Google Scholar
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: Proc. of British Machine Vision Conference, pp. 995–1004 (2008)
Google Scholar
Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: Proc. of IEEE International Conference on Computer Vision, pp. 492–497 (2009)
Google Scholar
Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proc. of IEEE Computer Vision and Pattern Recognition (2013)
Google Scholar
Brox, T., Bregler, C., Malik, J.: Large displacement optical flow. In: Proc. of IEEE Computer Vision and Pattern Recognition, pp. 41–48 (2009)
Google Scholar
Jensen, F.V., Christensen, H.I., Nielsen, J.: Bayesian methods for interpretation and control in multi-agent vision systems. In: Proc. of SPIE 1708, Applications of Artificial Intelligence X: Machine Vision and Robotics, pp. 536–548 (1994)
Google Scholar
Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006)
Chapter Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar
Noguchi, A., Yanai, K.: A surf-based spatio-temporal feature for feature-fusion-based action recognition. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part I. LNCS, vol. 6553, pp. 153–167. Springer, Heidelberg (2012)
Chapter Google Scholar
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proc. of IEEE Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Google Scholar
Atmosukarto, I., Ghanem, B., Ahuja, N.: Trajectory-based fisher kernel representation for action recognition in videos. In: Proc. of IAPR International Conference on Pattern Recognition, pp. 3333–3336 (2012)
Google Scholar
Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: Proc. of IEEE International Conference on Computer Vision (2003)
Google Scholar
Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action recognition through the motion analysis of tracked features. In: ICCV Workshop on Video-Oriented Object and Event Classification (2009)
Google Scholar
Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 494–507. Springer, Heidelberg (2010)
Chapter Google Scholar
Shandong, W., Omar, O., Mubarak, S.: Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: Proc. of IEEE International Conference on Computer Vision, pp. 1419–1426 (2011)
Google Scholar
Uijlings, J.R.R., Smeulders, A.W.M., Scha, R.J.H.: Real-time visual concept classification. IEEE Transactions on Multimedia (2010)
Google Scholar
Jhuang, H.A., Garrote, H.A., Poggio, E.A., Serre, T.A., Hmdb, T.: A large video database for human motion recognition. In: Proc. of IEEE International Conference on Computer Vision (2011)
Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. The Journal of Machine Learning Research 6, 1453–1484 (2005)
MATH MathSciNet Google Scholar
Reddy, K.K., Shar, M.: Recognizing 50 human action categories of web videos. Machine Vision and Applications 24, 971–981
Google Scholar
Laptev, I., Marszalek, A., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. of IEEE Computer Vision and Pattern Recognition (2008)
Google Scholar
Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: Proc. of IEEE Computer Vision and Pattern Recognition (2012)
Google Scholar
Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 256–269. Springer, Heidelberg (2012)
Chapter Google Scholar
Khurram, S., Amir, R.Z., Shar, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR, abs/1212.0402 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, The University of Electro-Communications, Chofu, Tokyo, 182-8585, Japan
Do Hang Nga & Keiji Yanai

Authors

Do Hang Nga
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Yanai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, Dublin City University, Dublin 9, Ireland
Cathal Gurrin
Fakultät IV für Elektrotechnik und Informatik, Technische Universität Berlin / DAI-Labor, 10587, Berlin, Germany
Frank Hopfgartner
Department of Information and Computing Sciences, Universiteit Utrecht, 3584 CC, Utrecht, The Netherlands
Wolfgang Hurst
UiT The Arctic University of Norway, 9019, Tromsø, Norway
Håvard Johansen
Singapore University of Technology and Design, Singapore
Hyowon Lee
School of Electrical Engineering, Dublin City University, Ireland
Noel O’Connor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nga, D.H., Yanai, K. (2014). A Dense SURF and Triangulation Based Spatio-temporal Feature for Action Recognition. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds) MultiMedia Modeling. MMM 2014. Lecture Notes in Computer Science, vol 8325. Springer, Cham. https://doi.org/10.1007/978-3-319-04114-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-04114-8_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04113-1
Online ISBN: 978-3-319-04114-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics