Skip to main content

VMASS: Massive Dataset of Multi-camera Video for Learning, Classification and Recognition of Human Actions

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8398))

Included in the following conference series:

Abstract

Expansion of capabilities of intelligent surveillance systems and research in human motion analysis requires massive amounts of video data for training of learning methods and classifiers and for testing the solutions under realistic conditions. While there are many publicly available video sequences which are meant for training and testing, the existing video datasets are not adequate for real world problems, due to low realism of scenes and acted out human behaviors, relatively small sizes of datasets, low resolution and sometimes low quality of video.

This article presents VMASS, a dataset of large volume high definition video sequences, which is continuously updated by data acquisition from multiple cameras monitoring urban areas of high activity. The VMASS dataset is described along with the acquisition and continuous updating processes and compared to other available video datasets of similar purpose. Also described is the sequence annotation process. The amount of video data collected so far exceeds 4000 hours, 540 million frames and 2 million recorded events, with 3500 events annotated manually using about 150 event types.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schuldt, C., Laptevand, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: ICPR (2004)

    Google Scholar 

  2. Laptev, I., Perez, P.: Retrieving actions in movies. In: ICCV, pp. 1–8 (2007)

    Google Scholar 

  3. Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C.-C., Lee, J.T., Mukherjee, S., Aggarwal, J.K., Lee, H., Davis, L., Swears, E., Wang, X., Ji, Q., Reddy, K., Shah, M., Vondrick, C., Pirsiavash, H., Ramanan, D., Yuen, J., Torralba, A., Song, B., Fong, A., Roy-Chowdhury, A., Desai, M.: A large-scale benchmark dataset for event recognition in surveillance video. In: CVPR 2011, pp. 3153–3160 (2011)

    Google Scholar 

  4. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as Space-Time Shapes. PAMI 29(12), 22472253 (2007)

    Article  Google Scholar 

  5. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the Wild”. In: CVPR 2009 (2009)

    Google Scholar 

  6. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. CVIU 104(2), 249–257 (2006)

    Google Scholar 

  7. Ke, Y., Sukthankar, R., Hebert, M.: Volumetric Features for Video Event Detection. IJCV 88(1) (2010)

    Google Scholar 

  8. Fisher, R.B.: The PETS04 Surveillance Ground-Truth Data Sets (2004)

    Google Scholar 

  9. Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: MIR 2006 (2006)

    Google Scholar 

  10. Hartley, R.I.: Self-Calibration from Multiple Views with a Rotating Camera. In: Eklundh, J.-O. (ed.) ECCV 1994, Part I. LNCS, vol. 800, pp. 471–478. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  11. Oberkampf, D., DeMenthon, D.F., Davis, L.S.: Iterative Pose Estimation Using Coplanar Feature Points. Computer Vision and Image Understanding 63(3), 495–511 (1996)

    Article  Google Scholar 

  12. KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. In: Proc. 2nd European Workshop on Advanced Video-Based Surveillance Systems (2001)

    Google Scholar 

  13. Zivkovic, Z.: Improved Adaptive Gaussian Mixture Model for Background Subtraction. In: International Conference Pattern Recognition, UK (August 2004)

    Google Scholar 

  14. Davis, J.W., Bradski, G.R.: Motion Segmentation and Pose Recognition with Motion History Gradients. Machine Vision and Applications (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kulbacki, M., Segen, J., Wereszczyński, K., Gudyś, A. (2014). VMASS: Massive Dataset of Multi-camera Video for Learning, Classification and Recognition of Human Actions. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds) Intelligent Information and Database Systems. ACIIDS 2014. Lecture Notes in Computer Science(), vol 8398. Springer, Cham. https://doi.org/10.1007/978-3-319-05458-2_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05458-2_58

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05457-5

  • Online ISBN: 978-3-319-05458-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics