Skip to main content

uulmMAD – A Human Action Recognition Dataset for Ground-Truth Evaluation and Investigation of View Invariances

  • Conference paper
  • First Online:
  • 873 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8869))

Abstract

In recent time, human action recognition has gained increasing attention in pattern recognition. However, many datasets in the literature focus on a limited number of target-oriented properties. Within this work, we present a novel dataset, named uulmMAD, which has been created to benchmark state-of-the-art action recognition architectures addressing multiple properties, e.g. high-resolutions cameras, perspective changes, realistic cluttered background and noise, overlap of action classes, different execution speeds, variability in subjects and their clothing, and the availability of a pose ground-truth. The uulmMAD was recorded using three synchronized high-resolution cameras and an inertial motion capturing system. Each subject performed fourteen actions at least three times in front of a green screen. Selected actions in four variants were recorded, i.e. normal, pausing, fast and deceleration. The data has been post-processed in order to separate the subject from the background. Furthermore, the camera and the motion capturing data have been mapped onto each other and 3D-avatars have been generated to further extend the dataset. The avatars have also been used to emulate the self-occlusion in pose recognition when using a time-of-flight camera. In this work, we analyze the uulmMAD using a state-of-the-art action recognition architecture to provide first baseline results. The results emphasize the unique characteristics of the dataset. The dataset will be made publicity available upon publication of the paper.

M. Glodek and G. Layher contributed equally to this work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Pike F-145 from Allied Vision with a Tevidon 1,8/16 lens.

  2. 2.

    Poser™ is a 3D modeling software for human avatars by Smith Micro Software.

  3. 3.

    http://www.uni-ulm.de/in/neuroinformatik.html.

References

  1. Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)

    Article  Google Scholar 

  2. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision 2005, ICCV 2005, vol. 2, pp. 1395–1402. IEEE (2005)

    Google Scholar 

  3. Escobar, M.J., Masson, G.S., Vieville, T., Kornprobst, P.: Action recognition using a bio-inspired feedforward spiking network. Int. J. Comput. Vis. 82(3), 284–301 (2009)

    Article  Google Scholar 

  4. Glodek, M., Geier, T., Biundo, S., Palm, G.: A layered architecture for probabilistic complex pattern recognition to detect user preferences. J. Biol. Inspired Cogn. Archit. 9, 46–56 (2014)

    Google Scholar 

  5. Glodek, M., Geier, T., Biundo, S., Schwenker, F., Palm, G.: Recognizing user preferences based on layered activity recognition and first-order logic. In: Proceedings of the International IEEE Conference on Tools with Artificial Intelligence (ICTAI), pp. 648–653. IEEE (2013)

    Google Scholar 

  6. Glodek, M., Reuter, S., Schels, M., Dietmayer, K., Schwenker, F.: Kalman filter based classifier fusion for affective state recognition. In: Zhou, Z.-H., Roli, F., Kittler, J. (eds.) MCS 2013. LNCS, vol. 7872, pp. 85–94. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Glodek, M., Schels, M., Schwenker, F., Palm, G.: Combination of sequential class distributions from multiple channels using Markov fusion networks. J. Multimodal User Interfaces 8(3), 257–272 (2014)

    Article  Google Scholar 

  8. Glodek, M., Trentin, E., Schwenker, F., Palm, G.: Hidden Markov models with graph densities for action recognition. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 964–969. IEEE (2013)

    Google Scholar 

  9. Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the Alvey Vision Conference, pp. 147–151 (1988)

    Google Scholar 

  10. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  11. Hassner, T.: A critical review of action recognition benchmarks. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 245–250. IEEE Computer Society (2013)

    Google Scholar 

  12. Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 4660–4665. IEEE (2014)

    Google Scholar 

  13. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005)

    Article  MathSciNet  Google Scholar 

  14. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition 2008, CVPR 2008, pp. 1–8. IEEE (2008)

    Google Scholar 

  15. Layher, G., Giese, M.A., Neumann, H.: Learning representations of animated motion sequences - a neural model. Top. Cogn. Sci. 6(1), 170–182 (2014)

    Article  Google Scholar 

  16. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition 2009, CVPR 2009, pp. 1996–2003. IEEE (2009)

    Google Scholar 

  17. Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: IEEE Conference on Computer Vision and Pattern Recognition 2007, CVPR’07, pp. 1–8. IEEE (2007)

    Google Scholar 

  18. Mishima, Y.: A software chromakeyer using polyhedric slice. In: Proceedings of NICOGRAPH, vol. 92, pp. 44–52 (1992)

    Google Scholar 

  19. Mishima, Y.: Soft edge chroma-key generation based upon hexoctahedral color space. U.S. Patent and Trademark Office, US Patent 5355174 A, Oct 1994

    Google Scholar 

  20. Patron, A., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in TV shows. In: Proceedings of the British Machine Vision Conference, pp. 50.1–50.11. BMVA Press (2010). doi:10.5244/C.24.50

  21. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)

    Article  Google Scholar 

  22. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley, Reading (1993)

    Google Scholar 

  23. Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)

    Article  Google Scholar 

  24. Roetenberg, D., Luinge, H., Slycke, P.: Xsens MVN: full 6DOF human motion tracking using miniature inertial sensors. Technical report, Xsens Technologies B. V. (2009)

    Google Scholar 

  25. Scherer, S., Glodek, M., Schwenker, F., Campbell, N., Palm, G.: Spotting laughter in natural multiparty conversations a comparison of automatic online and offline approaches using audiovisual data. ACM Trans. Interact. Intell. Syst. (TiiS) - Special Issue on Affective Interaction in Natural Environments 2(1), 4:1–4:31 (2012)

    Google Scholar 

  26. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition 2004, ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)

    Google Scholar 

  27. Smith, A.R., Blinn, J.F.: Blue screen matting. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 259–268. ACM (1996)

    Google Scholar 

  28. Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 548–561. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Acknowledgment

This paper is based on work done within the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation (DFG).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Glodek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Glodek, M. et al. (2015). uulmMAD – A Human Action Recognition Dataset for Ground-Truth Evaluation and Investigation of View Invariances. In: Schwenker, F., Scherer, S., Morency, LP. (eds) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction. MPRSS 2014. Lecture Notes in Computer Science(), vol 8869. Springer, Cham. https://doi.org/10.1007/978-3-319-14899-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14899-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14898-4

  • Online ISBN: 978-3-319-14899-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics