Fine-Grained Activity Recognition with Holistic and Pose Based Features

Pishchulin, Leonid; Andriluka, Mykhaylo; Schiele, Bernt

doi:10.1007/978-3-319-11752-2_56

Leonid Pishchulin¹⁶,
Mykhaylo Andriluka^16,17 &
Bernt Schiele¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8753))

Included in the following conference series:

German Conference on Pattern Recognition

3126 Accesses

Abstract

Holistic methods based on dense trajectories [29, 30] are currently the de facto standard for recognition of human activities in video. Whether holistic representations will sustain or will be superseded by higher level video encoding in terms of body pose and motion is the subject of an ongoing debate [12]. In this paper we aim to clarify the underlying factors responsible for good performance of holistic and pose-based representations. To that end we build on our recent dataset [2] leveraging the existing taxonomy of human activities. This dataset includes $24,920$ video snippets covering $410$ human activities in total. Our analysis reveals that holistic and pose-based methods are highly complementary, and their performance varies significantly depending on the activity. We find that holistic methods are mostly affected by the number and speed of trajectories, whereas pose-based methods are mostly influenced by viewpoint of the person. We observe striking performance differences across activities: for certain activities results with pose-based features are more than twice as accurate compared to holistic features, and vice versa. The best performing approach in our comparison is based on the combination of holistic and pose-based approaches, which again underlines their complementarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Multimode Two-Stream Network for Egocentric Action Recognition

Research and Analysis of Video-Based Human Pose Estimation

Video benchmarks of human action datasets: a review

Article 17 August 2018

References

Ainsworth, B., Haskell, W., Herrmann, S., Meckes, N., Bassett, D., Tudor-Locke, C., Greer, J., Vezina, J., Whitt-Glover, M., Leon, A.: 2011 compendium of physical activities: a second update of codes and MET values. MSSE 43(8), 1575–1581 (2011)
Google Scholar
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human poseestimation: new benchmark and state of the art analysis. In: CVPR’14
Google Scholar
Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: ICCV’11
Google Scholar
Cardinaux, F., Bhowmik, D., Abhayaratne, C., Hawley, M.S.: Video based technology for ambient assisted living: a review of the literature. J. Ambient Intell. Smart Environ. 3(3), 253–269 (2011)
Google Scholar
Chakraborty, B., Holte, M.B., Moeslund, T.B., Gonzalez, J., Xavier Roca, F.: A selective spatio-temporal interest point detector for human action recognition in complex scenes. In: ICCV’11
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection.In: CVPR’05
Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: ECCV’06
Google Scholar
Dantone, M., Gall, J., Leistner, C., Gool., L.V.: Human pose estimation usingbody parts dependent joint regressors. In: CVPR’13
Google Scholar
Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: ICCV’09
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Ferrari, V., Marin, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR’08
Google Scholar
Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV’13
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Laptev, I.: On space-time interest points. IJCV 64(2/3), 107–123 (2005)
Google Scholar
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistichuman actions from movies. In: CVPR’08
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in thewild. In: CVPR’09
Google Scholar
Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR’09
Google Scholar
Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditionedpictorial structures. In: CVPR’13
Google Scholar
Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV’13
Google Scholar
Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR’08
Google Scholar
Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR’12
Google Scholar
Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer andzero-shot learning in a large-scale setting. In: CVPR’11
Google Scholar
Sadanand, S., J., C.J.: Action bank: a high-level representation of activity in video. In: ECCV’12
Google Scholar
Sapp, B., Taskar, B.: Multimodal decomposable models for human pose estimation. In: CVPR’13
Google Scholar
Singh, V.K., Nevatia, R.: Action recognition in cluttered dynamic scenes usingpose-specific part models. In: ICCV’11
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human action classes from videos in the wild. Technical report CRCV-TR-12-01, UCF (2012)
Google Scholar
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR’10
Google Scholar
Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. VC 29(10), 983–1009 (2013)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)
Article Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In:ICCV’13
Google Scholar
Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation oflocal spatio-temporal features for action recognition. In: BMVC’09
Google Scholar
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 61(1), 55–79 (2013)
Google Scholar

Download references

Acknowledgements

The authors would like to thank Marcus Rohrbach and Sikandar Amin for helpful discussions. This work has been supported by the Max Planck Center for Visual Computing & Communication.

Author information

Authors and Affiliations

Max Planck Institute for Informatics, Saarbrücken, Germany
Leonid Pishchulin, Mykhaylo Andriluka & Bernt Schiele
Stanford University, Stanford, USA
Mykhaylo Andriluka

Authors

Leonid Pishchulin
View author publications
You can also search for this author in PubMed Google Scholar
Mykhaylo Andriluka
View author publications
You can also search for this author in PubMed Google Scholar
Bernt Schiele
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonid Pishchulin .

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, University of Münster, Münster, Germany
Xiaoyi Jiang
Computer Science Department 5, University of Erlangen-Nürnberg, Erlangen, Germany
Joachim Hornegger
Department of Computer Science, University of Kiel, Kiel, Germany
Reinhard Koch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pishchulin, L., Andriluka, M., Schiele, B. (2014). Fine-Grained Activity Recognition with Holistic and Pose Based Features. In: Jiang, X., Hornegger, J., Koch, R. (eds) Pattern Recognition. GCPR 2014. Lecture Notes in Computer Science(), vol 8753. Springer, Cham. https://doi.org/10.1007/978-3-319-11752-2_56

Download citation

DOI: https://doi.org/10.1007/978-3-319-11752-2_56
Published: 15 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11751-5
Online ISBN: 978-3-319-11752-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fine-Grained Activity Recognition with Holistic and Pose Based Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Multimode Two-Stream Network for Egocentric Action Recognition

Research and Analysis of Video-Based Human Pose Estimation

Video benchmarks of human action datasets: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Fine-Grained Activity Recognition with Holistic and Pose Based Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Multimode Two-Stream Network for Egocentric Action Recognition

Research and Analysis of Video-Based Human Pose Estimation

Video benchmarks of human action datasets: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation