Skip to main content
Log in

Recent evolution of modern datasets for human activity recognition: a deep survey

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Human activity recognition has been a significant goal of computer vision since its inception and has developed considerably in the last years. Recent approaches to this problem increasingly favour the use of data-driven deep learning methods. To facilitate the comparison of these methods, several datasets pertaining to labelled human activity have been created, having great variation in content and methodology. As the field has developed, the datasets used have undergone considerable evolution as well. In this paper, we attempt to classify and describe a variety of datasets for researchers to choose the most suitable benchmark for their domain. For this, we propose a set of characteristics by which datasets may be compared. We also describe the progress in recent years that sets modern datasets apart from those used in the past.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Asadi-Aghbolaghi, M. et al.: A survey on deep learning based approaches for action and gesture recognition in image sequences. In: Proc.—12th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2017—1st Int. Work. Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP 2017, Biometrics Wild, Bwild 2017, Heteroge, pp. 476–483 (2017)

  2. Ramasamy Ramamurthy, S., Roy, N.: Recent trends in machine learning for human activity recognition-A survey. Wiley Interdisc. Rev. Data Min. Knowl. Disc. (2018). https://doi.org/10.1002/widm.1254

    Article  Google Scholar 

  3. Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)

    Article  Google Scholar 

  4. Ramezani, M., Yaghmaee, F.: A review on human action analysis in videos for retrieval applications. Artif. Intell. Rev. 46(4), 485–514 (2016)

    Article  Google Scholar 

  5. Zhang, N., Wang, Y., Yu, P.: A review of human action recognition in video. In: Proceedings—17th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2018, pp. 57–62 (2018)

  6. Liu, Y., Nie, L., Han, L., Zhang, L., Rosenblum, D.S.: Action2Activity: Recognizing complex activities from sensor data. In: IJCAI International Joint Conference on Artificial Intelligence, vol. 2015, pp. 1617–1623 (2015)

  7. Liu, Y., Nie, L., Liu, L., Rosenblum, D.S.: From action to activity: sensor-based activity recognition. Neurocomputing 181, 108–115 (2016)

    Article  Google Scholar 

  8. Dai, X., Singh, B., Zhang, G., Davis, L.S. and Chen, Y.Q.: Temporal context network for activity localization in videos. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 5727–5736 (2017)

  9. Zelnik-Manor, L., Irani, M.: Event-based analysis of video. WWW, 2005. [Online]. http://www.wisdom.weizmann.ac.il/~vision/VideoAnalysis/Demos/EventDetection/EventDetection.html. Accessed 25 Mar 2019

  10. Chaquet, J.M., Carmona, E.J., Fernández-Caballero, A.: A survey of video datasets for human action and activity recognition. Comput Vis. Image Underst 117(6), 633–659 (2013)

    Article  Google Scholar 

  11. Fouhey, D.F., Kuo, W.C., Efros, A.A., Malik, J.: From lifestyle Vlogs to everyday interactions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4991–5000 (2018)

  12. Heilbron, F.C., Escorcia, V., Ghanem, B., Niebles, J.C.: ActivityNet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07, pp. 961–970 (2015)

  13. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild (2012). arXiv:1212.0402

  14. Kay, W. et al.: The kinetics human action video dataset (2017) arXiv:1705.06950 [cs.CV]

  15. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2556–2563 (2011)

  16. Yoshikawa, Y., Lin, J., Takeuchi, A.: STAIR actions: a video dataset of everyday home actions (2018). arXiv:1804.04326

  17. Chao, Y.W., Wang, Z., He, Y., Wang, J., Deng, J.: HICO: A benchmark for recognizing human-object interactions in images. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1017–1025 (2015)

  18. Le, D.-T., Uijlings, J., Bernardi, R.: TUHOI: Trento universal human object interaction dataset. In: Proceedings of the Third Workshop on Vision and Language (2014)

  19. Blunsden, S., Fisher, R.B.: The BEHAVE video dataset: ground truthed video for multi-person behavior classification. Ann. BMVA 2010(4), 1–11 (2010)

    Google Scholar 

  20. Gu, C. et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6047–6056 (2018)

  21. Monfort, M. et al.: Moments in time dataset: one million videos for event understanding. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1 (2019)

  22. Moltisanti, D. et al.: Scaling egocentric vision: the dataset. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11208, pp. 753–771. LNCS (2018)

  23. Kuehne, H., Iqbal, A., Richard, A., Gall, J.: Mining YouTube—a dataset for learning fine-grained action concepts from Webly supervised video data (2019). arXiv:1906.01012

  24. Gella, S., Keller, F.: An analysis of action recognition datasets for language and vision tasks. In: ACL 2017—55th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference (Long Paper), vol. 2, no. c, pp. 64–71 (2017)

  25. Soomro, K., Zamir, A.R.: Computer Vision in Sports. Springer, Berlin (2014)

    Google Scholar 

  26. Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: 26th IEEE IEEE Conference on Computer Visions Pattern Recognition, CVPR (2008)

  27. Niebles, J.C., Chen, C.W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6312, no. PART 2, pp. 392–405. LNCS (2010)

  28. Karpathy, A. et al.: Large-scale video classification with convolutional neural networks. WWW, 2014. [Online]. https://cs.stanford.edu/people/karpathy/deepvideo/. Accessed 25 Mar 2019

  29. Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, pp. 1971–1980 (2016)

  30. Giancola, S., Amine, M., Dghaily, T., Ghanem, B.: SoccerNet: a scalable dataset for action spotting in soccer videos. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Work, vol. 2018, pp. 1792–1802 (2018)

  31. Bloom, V., Makris, D., Argyriou, V.: G3D: a gaming action dataset and real time action recognition evaluation framework. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Work, pp. 7–12 (2012)

  32. Bloom, V., Argyriou, V., Makris, D.: G3di: a gaming interaction dataset with a real time detection and evaluation framework. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8925, pp. 698–712 (2015)

  33. Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Work. CVPRW 2010, vol. 2010, pp. 9–14 (2010)

  34. Rybok, L., Friedberger, S., Hanebeck, U.D., Stiefelhagen, R.: The KIT Robo-kitchen data set for the evaluation of view-based activity recognition systems. In: IEEE-RAS International Conference on Humanoid Robots, pp. 128–133 (2011)

  35. De La Torre, F. et al.: Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database. In: C. Robot. Inst., p. 19 (2008)

  36. Tenorth, M., Bandouch, J., Beetz, M.: The TUM kitchen data set of everyday manipulation activities for motion tracking and action recognition. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009, pp. 1089–1096 (2009)

  37. Li, Y., Liu, M., Rehg, J.M.: In the eye of beholder: joint learning of gaze and actions in first person video. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11209, pp. 639–655. LNCS (2018)

  38. Fontana, V., Singh, G., Akrigg, S., Di Maio, M., Saha, S., Cuzzolin, F.: Action detection from a robot-car perspective (2018). arXiv:1807.11332

  39. Oxford Mobile Robotics Group, Oxford RobotCar Dataset. WWW (2016). [Online]. http://robotcar-dataset.robots.ox.ac.uk. Accessed 25 Mar 2019

  40. Fisher, R.B.: The PETS04 surveillance ground-truth data sets. In: Evaluation of Tracking and Surveillance, pp. 1–5, 2004

  41. List, T., Bins, J., Vazquez, J., Fisher, R.B.: Performance evaluating the evaluator. In: Proc.—2nd Jt. IEEE Int. Work. Vis. Surveill. Perform. Eval. Track. Surveillance, VS-PETS, vol. 2005, pp. 129–136, 2005

  42. Nghiem, A., et al.: ETISEO, performance evaluation for video surveillance systems. In: IEEE International Conference on Advanced Video and Signal based Surveillance. IEEE, London, UK, 5-7 Sept. 2007. https://doi.org/10.1109/AVSS.2007.4425357

  43. Zhang, J., Li, W., Ogunbona, P.O., Wang, P., Tang, C.: RGB-D-based action recognition datasets: a survey. Pattern Recognit. 60, 86–105 (2016)

    Article  Google Scholar 

  44. Chavarriaga, R., et al.: The Opportunity challenge: a benchmark database for on-body sensor-based activity recognition. Pattern Recognit. Lett. 34(15), 2033–2042 (2013)

    Article  Google Scholar 

  45. De-La-Hoz-Franco, E., Ariza-Colpas, P., Quero, J.M., Espinilla, M.: Sensor-based datasets for human activity recognition—a systematic review of literature. IEEE Access 6, 59192–59210 (2018)

    Article  Google Scholar 

  46. Ji, S., Xu, W., Yang, M., Yu, K.: 3D Convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  47. Hassner, T.: A critical review of action recognition benchmarks. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 245–250 (2013)

  48. Rohrbach M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1194–1201 (2012)

  49. Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2012. [Online]. https://www.csee.umbc.edu/~hpirsiav/papers/ADLdataset/. Accessed 10 Aug 2019

  50. Weinzaepfel, P., Martin, X., Schmid, C.: Human action localization with sparse spatial supervision (2016). arXiv:1605.05197

  51. Sigurdsson, G.A., Varol, G., Wang, X., Farhadi, A., Laptev, I., Gupta, A.: Hollywood in homes: crowdsourcing data collection for activity understanding. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9905, pp. 510–526. LNCS (2016)

  52. Goyal, R. et al.: The ‘something something’ video database for learning and evaluating visual common sense. In: Proceedings of the IEEE International Conference on Computer Vision, 2017, vol. 2017, pp. 5843–5851 (2018)

  53. de Souza, C.R., Gaidon, A., Cabon, Y., Peña, A.M.L.: Procedural generation of videos to train deep action recognition networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4757–4767 (2017)

  54. Paolacci, G., Chandler, J., Mueller, P.: Online experimentation: amazon mechanical turk. Four Essays Consum. Decis. 5(5), 59 (2012)

    Google Scholar 

  55. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 26th IEEE Conference Computer Vision Pattern Recognition, pp. 1–8. CVPR (2008)

  56. Ikizler-cinbis, N., Cinbis, R.G., Sclaroff, S.: Learning actions from the web. In: Proceedings of the International Conference On Computer Vision (ICCV’09), pp. 1–8, Kyoto, Japan (2009)

  57. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, no. 28, pp. 1521–1528 (2011)

  58. Sigurdsson, G.A., Russakovsky, O., Gupta, A.: What actions are needed for understanding human actions in videos? In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 2156–2165 (2017)

  59. Nowozin, S., Shotton, J.: Action points: a representation for low-latency online human action recognition. In: Microsoft Res. Cambridge Technical Rep., no. MSR-TR-2012-68, pp. 1–18 (2012)

  60. Heidarivincheh, F., Mirmehdi, M., Damen, D.: Action completion: a temporal model for moment detection (2018). arXiv:1805.06749

  61. Idrees, H., et al.: The THUMOS challenge on action recognition for videos ‘in the wild’. Comput. Vis. Image Underst. 155, 1–23 (2017)

    Article  Google Scholar 

  62. Liu, C., Hu, Y., Li, Y., Song, S., Liu, J.: PKU-MMD: a large scale benchmark for continuous multi-modal human action understanding (2017). arXiv:1703.07475

  63. Bojanowski, P. et al.: Weakly supervised action labeling in videos under ordering constraints lecture notes in computer science. In: Computer Vision—ECCV 2014, vol. 8693, no. Chapter 41, pp. 628–643 (2014)

  64. Minnen, D., Westeyn, T.L., Starner, T., Ward, J.A., Lukowicz, P.: Performance metrics and evaluation issues for continuous activity recognition. In: Proceedings of International Work Performance Metrics for Intelligent Systems, pp. 141–148 (2006)

  65. Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Niebles, J.C.: Dense-captioning events in videos. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 706–715 (2017)

  66. Kliper-Gross, O., Hassner, T., Wolf, L.: The action similarity labeling challenge. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 615–621 (2012)

    Article  Google Scholar 

  67. Cao, L., Liu, Z., Huang, T.S.: Cross-dataset action detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1998–2005 (2010)

  68. Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. Proc. Int. Conf. Pattern Recognit. 3, 32–36 (2004)

    Article  Google Scholar 

  69. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2007)

  70. Piergiovanni, A.J., Ryoo, M.S.: Fine-grained activity recognition in baseball videos. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Work, vol. 2018, pp. 1821–1830 (2018)

  71. Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: Proceedings 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, no. 1229, pp. II-123–II-130 (2005)

  72. Ivan L., Barbara C.: Recognition of human actions. (2005). [Online]. Available: http://www.nada.kth.se/cvap/actions/. Accessed 25 Mar 2019

  73. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes (2007). [Online]. http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html. Accessed 25 Mar 2019

  74. CAVIAR,: CAVIAR: Context Aware Vision using Image-based Active Recognition, European Commission project IST 2001 37540. Www, 2001. [Online]. http://groups.inf.ed.ac.uk/vision/CAVIAR/. Accessed 25 Mar 2019

  75. Nghiem, A.T., Bremond, F., Thonnat, M., Ma, R.: A new evaluation approach for video processing algorithms. In: 2007 IEEE Work. Motion Video Comput. WMVC 2007 (2007)

  76. Nghiem, A.T., Bremond, F., Thonnat, M., Ma, R.: ETISEO dataset. Www, 2007. [Online]. https://www-sop.inria.fr/orion/ETISEO/. Accessed: 25 Mar 2019

  77. Vezzani, R., Cucchiara, R., Dipartimento, I.: ViSOR : Video Surveillance Online Repository. vol. 2010, no. 2, pp. 1–13 (2010)

  78. Vezzani, R., Cucchiara, R.: Video surveillance online repository (ViSOR), Www, 2013. [Online]. http://www.openvisor.org/index.asp. Accessed 25 Mar 2019

  79. INRIA Xmas Motion Acquisition Sequences (IXMAS), Www, 2006. [Online]. http://4drepository.inrialpes.fr/public/viewgroup/. Accessed 25 Mar 2019

  80. Tieniu Tan, D.Z.S.: Center for biometrics and security research. Www. [Online]. http://www.cbsr.ia.ac.cn/china/Iris Databases CH.asp. Accessed 25 Mar 2019

  81. “UCF Aerial Action Data Set,” Www (2009). [Online]. Available: http://crcv.ucf.edu/data/UCF_Aerial_Action.php. Accessed 25 Mar 2019

  82. Rodriguez, M., Mikel D.A., Javed, S.: UCF-ARG Data Set. Www (2011). [Online]. http://crcv.ucf.edu/data/UCF-ARG.php. Accessed: 25 Mar 2019

  83. Khurram Soomro, A.R.Z.: UCF sports action data set, Www (2008). [Online]. http://crcv.ucf.edu/data/UCF_Sports_Action.php

  84. Fu, R., Song, Y., Zhao, W.: Human activity recognition with smartpnones. In: Proceedings of 10th European Conference on Computer Vision—ECCV’08, no. Section 2, pp. 548–561 (2008)

  85. Tran, D, Sorokin, A.: Human activity recognition with metric learning, www, 2008. [Online]. http://vision.cs.uiuc.edu/projects/activity/. Accessed 25 Mar 2019

  86. Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3DPost multi-view and 3D human action/interaction database. In: CVMP 2009 - 6th Eur. Conf. Vis. Media Prod., pp. 159–168 (2009)

  87. Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: i3DPost multi-view human action datasets. Www (2009). [Online]. http://kahlan.eps.surrey.ac.uk/i3dpost_action/. Accessed 25 Mar 2019

  88. Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 104–111 (2009)

  89. Messing, R., Pal, C., Kautz, H.: University of Rochester activities of daily living dataset, Www, 2009. [Online]. http://www.cs.rochester.edu/~rmessing/uradl/. Accessed 25 Mar 2019

  90. Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th International Conference on Computer Vision Work ICCV Work. 2009, vol. 24, pp. 1282–1289 (2009)

  91. Choi, W., Shahid, K., Savarese, S.: Collective activity dataset, Www (2009). [Online]. http://vhosts.eecs.umich.edu/vision/activity-dataset.html. Accessed 25 Mar 2019

  92. Project, B.: Computer-assisted prescreening of video streams for unusual activities. Www, 2004. [Online]. http://homepages.inf.ed.ac.uk/rbf/BEHAVE/. Accessed 25 Mar 2019

  93. Singh, S., Velastin, S.A., Ragheb, H.: MuHAVi: A multicamera human action video dataset for the evaluation of action recognition methods. In: Proceedings—IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2010, pp. 48–55 (2010)

  94. Murtaza, F., Yousaf, M.H., Velastin, S.A.: Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description. IET Comput. Vis. 10(7), 758–767 (2016)

    Article  Google Scholar 

  95. Singh, S., Velastin, S.A., Ragheb, H.: MuHAVi: multicamera human action video data. [Online]. http://dipersec.king.ac.uk/MuHAVi-MAS/. Accessed 25 Mar 2019

  96. Ryoo, M.S., Aggarwal, J.K., Chen, C., Roy-chowdhury, A.: An overview of contest on semantic description of human activities (SDHA). In: ICPR contests (2010)

  97. Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of the IEEE International Conference on Computer Vision, no. Iccv, pp. 1593–1600 (2009)

  98. Ryoo, M.S., Aggarwal, J.K., Chen, C., Roy-chowdhury, A.: ICPR 2010 contest on semantic description of human activities (SDHA 2010), Www (2010). [Online]. http://cvrc.ece.utexas.edu/SDHA2010/. Accessed 25 Mar 2019

  99. Chen, C., Aggarwal, J.K.: Recognizing human action from a far field of view the University of Texas at Austin. In: Analysis (2009)

  100. Chen, C.-C., Ryoo, M.S., Aggarwal, J.K.: UT-tower dataset: aerial view activity classification challenge (2010). http://cvrc.ece.utexas.edu/SDHA2010/Aerial_View_Activity.html

  101. Oh, S. et al., AVSS 2011 demo session: a large-scale benchmark dataset for event recognition in surveillance video. In: 2011 8th IEEE Int. Conf. Adv. Video Signal Based Surveillance, AVSS 2011, no. 2, pp. 527–528, 2011

  102. Oh, S. et al.: VIRAT video dataset, Www (2011). [Online]. http://www.viratdata.org/. Accessed 25 Mar 2019

  103. Gehrig, D., et al.: Combined intention, activity, and motion recognition for a humanoid household robot. In: IEEE International Conference on Intelligent Robots and Systems, pp. 4819–4825 (2011)

  104. Gehrig, D., et al.: The Karlsruhe Motion, Intention, and Activity Data set (MINTA), Www (2011). [Online]. https://cvhci.anthropomatik.kit.edu/~lrybok/projects/minta/. Accessed 25 Mar 2019

  105. Rybok, L., Friedberger, S., Hanebeck, U.D., Stiefelhagen, R.: The KIT Robo-Kitchen Activity Data Set, Www (2011). [Online]. https://cvhci.anthropomatik.kit.edu/~lrybok/projects/kitchen/. Accessed 25 Mar 2019

  106. Ivan, L., Marszałek, M., Schmid, C., Rozenfeld, B.: IRISA/INRIA Rennes France: learning human actions from movies, Www (2009). [Online]. http://www.irisa.fr/vista/actions/hollywood. Accessed 25 Mar 2019

  107. Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Work CVPR Work. 2009, vol. 2009 IEEE, no. i, pp. 2929–2936 (2009)

  108. Ivan, L., Marszałek, M., Schmid, C., Rozenfeld, B.: IRISA/INRIA Rennes France: learning human actions from movies, Www (2008). [Online]. http://www.irisa.fr/vista/actions/hollywood2. Accessed 25 Mar 2019

  109. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos ́in the Wild. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Work CVPR Work. 2009, vol. 2009 IEEE, pp. 1996–2003 (2009)

  110. Liu, J., Jingen Liu, M.S.: UCF YouTube action data set, Www (2009). [Online]. http://crcv.ucf.edu/data/UCF_YouTube_Action.php. [Accessed: 25-Mar-2019]

  111. Patron-Perez, A., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in TV shows. In: Br. Mach. Vis. Conf. BMVC 2010—Proc., pp. 50.1–50.11 (2010)

  112. Patron-Perez, A.: TV human interaction dataset. Www (2010). [Online]. http://www.robots.ox.ac.uk/~alonso/tv_human_interactions.html. Accessed 25 Mar 2019

  113. “Olympic sports dataset,” Standford University (2010). [Online]. http://vision.stanford.edu/Datasets/OlympicSports/

  114. Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)

    Article  Google Scholar 

  115. Reddy, K.K., Shah, M.: UCF50—action recognition data set, Www (2010). [Online]. https://www.crcv.ucf.edu/data/UCF50.php. Accessed 25 Mar 2019

  116. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB : a large human motion database, Www (2014). [Online]. http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/. Accessed 25 Mar 2019

  117. Soomro, K., Zamir, A.R. and Shah, M.: UCF101—action recognition data set, Www (2012). [Online]. http://crcv.ucf.edu/data/UCF101.php. Accessed 25 Mar 2019

  118. Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: MPII cooking activities dataset, Www (2013). [Online]. https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/human-activity-recognition/mpii-cooking-activities-dataset/. Accessed 25 Mar 2019

  119. Rohrbach, M., Regneri, M., Andriluka, M., Amin, S., Pinkal, M., Schiele, B.: Script data for attribute-based recognition of composite activities. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7572, no. PART 1, pp. 144–157. LNCS (2012)

  120. Rohrbach, M., Regneri, M., Andriluka, M., Amin, S., Pinkal, M., Schiele, B.: MPII cooking composite activities dataset, Www (2012). [Online]. https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/human-activity-recognition/mpii-cooking-activities-dataset/. Accessed 10 Aug 2019

  121. Kliper-gross, O., Hassner, T.: The action similarity labeling challenge. Www (2012). [Online]. http://www.openu.ac.il/home/hassner/data/ASLAN/ASLAN.html. Accessed 25 Mar 2019

  122. Mettes, P., van Gemert, J.C., Snoek, C.G.M.: Spot on: action localization from pointly-supervised proposals. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9909 LNCS, no. Mil, pp. 437–453 (2016)

  123. Mettes, P., van Gemert, J.C., Snoek, C.G.M.: “Hollywood2Tubes,” Www (2016). [Online]. https://staff.fnwi.uva.nl/p.s.m.mettes/codedata.html. Accessed 25 Mar 2019

  124. Das, P., Xu, C., Doell, R.F., Corso, J.J.: A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2634–2641 (2013)

  125. Das, P., Xu, C., Doell, R.F., Corso, J.J.: U of Michigan, YouCook: an annotated data set of unconstrained third-person cooking videos, Www (2013) [Online]. http://web.eecs.umich.edu/~jjcorso/r/youcook/. Accessed 25 Mar 2019

  126. Jiang, Y.-G., et al.: THUMOS challenge: action recognition with a large number of classes. http://crcv.ucf.edu/THUMOS14/

  127. Jiang, Y.-G. et al.: THUMOS 13: the first international workshop on action recognition with a large number of classes, in conjunction with ICCV’13, Sydney, Australia, Www (2013). [Online]. http://crcv.ucf.edu/ICCV13-Action-Workshop/. Accessed 25 Mar 2019

  128. Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 3192–3199 (2013)

  129. Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Joint-annotated human motion data base, Www (2013). [Online]. http://jhmdb.is.tue.mpg.de. Accessed 25 Mar 2019

  130. Kuehne, H., Arslan, A., Serre, T.: The language of actions: recovering the syntax and semantics of goal-directed human activities. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 780–787 (2014)

  131. Kuehne, H., Arslan, A., Serre, T.: The breakfast actions dataset, Www (2014). [Online]. http://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/. Accessed 10 Aug 2019

  132. Jiang, Y.-G., et al.: THUMOS challenge 2014, Www (2014). [Online]. http://crcv.ucf.edu/THUMOS14/. Accessed 25 Mar 2019

  133. Karpathy, A., Leung, T.: Large-scale video classification with convolutional neural networks, Www (2014). [Online]. https://cs.stanford.edu/people/karpathy/deepvideo/. Accessed 25 Mar 2019

  134. Idrees, H. et al., THUMOS Challenge 2015 In conjunction with CVPR’15, Www (2015). [Online]. http://www.thumos.info/home.html. Accessed 25 Mar 2019

  135. Lee, K., Ognibene, D., Chang, H.J., Kim, T.K., Demiris, Y.: STARE: spatiooral attention relocation for multiple structured activities detection. IEEE Trans. Image Process. 24(12), 5916–5927 (2015)

    Article  MathSciNet  Google Scholar 

  136. Lee, K., Ognibene, D., Chang, H.J., Kim, T.K., Demiris, Y.: Crêpe Dataset, Www, 2013. [Online]. https://osf.io/d5k38/wiki/home/. Accessed 25 Mar 2019

  137. Safdarnejad, S.M., Liu, X., Udpa, L., Andrus, B., Wood, J., Craven, D.: Sports videos in the wild (SVW): a video dataset for sports analysis. In: 2015 11th IEEE Int, p. 2015. FG, Conf. Work. Autom. Face Gesture Recognition (2015)

  138. Safdarnejad, S.M., Liu, X., Udpa, L., Andrus, B., Wood, J., Craven, D.: Sports videos in the wild (SVW), Www (2015). [Online]. http://cvlab.cse.msu.edu/project-svw.html. Accessed 25 Mar 2019

  139. Heilbron, F.C., Escorcia, V., Ghanem, B., Niebles, J.C.: ActivityNet: a large-scale video benchmark for human activity understanding, Www (2015). [Online]. http://www.activity-net.org. Accessed 25 Mar 2019

  140. Rohrbach, M., et al.: Recognizing fine-grained and composite activities using hand-centric features and script data. Int. J. Comput. Vis. 119(3), 346–373 (2016)

    Article  MathSciNet  Google Scholar 

  141. Rohrbach, M. et al.: MPII cooking 2 dataset, Www (2015). [Online]. https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/human-activity-recognition/mpii-cooking-2-dataset/. Accessed 10 Aug 2019

  142. Singh, B., Marks, T.K., Jones, M., Tuzel, O., Shao, M.: A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, pp. 1961–1970 (2016)

  143. Singh, B., Marks, T.K., Jones, M., Tuzel, O., Shao, M.: MERL shopping dataset, Www (2016). [Online]. http://www.merl.com/demos/merl-shopping-dataset. Accessed 25 Mar 2019

  144. Sigurdsson, G.A., Varol, G., Wang, X., Farhadi, A., Laptev, I., Gupta, A.: Charades and charades-ego datasets, Www (2016). [Online]. http://allenai.org/plato/charades/. Accessed 25 Mar 2019

  145. Mori, G., Andriluka, M., Russakovsky, O., Jin, N., Fei-Fei, L., Yeung, S.: Every moment counts: dense detailed labeling of actions in complex videos. Int. J. Comput. Vis. 126(2–4), 375–389 (2017)

    MathSciNet  Google Scholar 

  146. Mori, G., Andriluka, M., Russakovsky, O., Jin, N., Fei-Fei, L., Yeung, S.: Every moment counts: dense detailed labeling of actions in complex videos, Www (2018). [Online]. http://ai.stanford.edu/~syyeung/everymoment.html. Accessed 25 Mar 2019

  147. Goyal, R. et al.: The 20BN something something dataset, Www (2017). [Online]. https://20bn.com/datasets/something-something. Accessed 10 Aug 2019

  148. Weinzaepfel, P., Martin, X., Schmid, C.: Daily action localization in Youtube videos, Www (2017). [Online]. http://pascal.inrialpes.fr/data/daly/. Accessed 25 Mar 2019

  149. Gu, C. et al.: AVA actions dataset, Www (2017). [Online]. https://research.google.com/ava/. Accessed 25 Mar 2019

  150. Xu, C., Hsieh, S.H., Xiong, C., Corso, J.J.: Can humans fly? Action understanding with multiple classes of actors. In; Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07, pp. 2264–2273 (2015)

  151. Xu, C., Hsieh, S.H., Xiong, C., Corso, J.J.:, A2D: a dataset and benchmark for action recognition and segmentation with multiple classes of actors, Www (2015). [Online]. http://web.eecs.umich.edu/~jjcorso/r/a2d/. Accessed 25 Mar 2019

  152. Kay, W. et al.: Kinetics datasets, Www (2017). [Online]. http://deepmind.com/kinetics. Accessed 25 Mar 2019

  153. Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., Zisserman, A.: A short note about kinetics-600 (2018). arXiv:1808.01340

  154. Fouhey, D.F., Kuo, W., Efros, A.A., Malik, J.: From lifestyle VLOGs to everyday interactions: the VLOG dataset, Www (2017). [Online]. http://web.eecs.umich.edu/~fouhey/2017/VLOG/. Accessed 10 Aug 2019

  155. Zhou, L., Xu, C., Corso, J.J.: Towards automatic learning of procedures from web instructional videos. In: 32nd AAAI conference on artificial intelligence, AAAI 2018, pp. 7590–7598 (2018)

  156. Zhou, L., Xu, C., Corso, J.J.: YouCook2 dataset, Www, 2018. [Online]. http://youcook2.eecs.umich.edu/

  157. Giancola, S., Amine, M., Dghaily, T., Ghanem, B., SoccerNet: a scalable dataset for action spotting in soccer videos, Www (2018). [Online]. https://silviogiancola.github.io/SoccerNet. Accessed 25 Mar 2019

  158. Piergiovanni, A.J., Ryoo, M.S.: MLB-YouTube dataset, Www (2018). [Online]. https://github.com/piergiaj/mlb-youtube/. Accessed 25 Mar 2019

  159. Yoshikawa, Y., Lin, J., Takeuchi, A.: STAIR actions: a large-scale video dataset of everyday human actions, Www (2018). [Online]. http://actions.stair.center. Accessed 25 Mar 2019

  160. Shim, M., Kim, Y.H., Kim, K., Kim, S.J.: Teaching machines to understand baseball games: large-scale baseball video database for multiple video understanding tasks. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11219, pp. 420–437. LNCS (2018)

  161. Shim, M., Kim, Y.H., Kim, K., Kim, S.J.: Teaching machines to understand baseball games : large-scale baseball video database for multiple video understanding tasks, Www (2018) [Online]. https://sites.google.com/site/eccv2018bbdb/. Accessed 25 Mar 2019

  162. Monfort, M. et al.: Moments in time dataset : a large-scale dataset for recognizing and understanding action in videos, Www (2019). [Online]. http://moments.csail.mit.edu. Accessed 25 Mar 2019

  163. Zhao, H., Yan, Z., Torresani, L., Torralba, A.: HACS: human action clips and segments dataset for recognition and temporal localization (2017). arXiv:1712.09374

  164. Tang, Y. et al.: COIN: a large-scale dataset for comprehensive instructional video analysis (2019). arXiv:1903.02874

  165. Fathi, A., Ren, X., Rehg, J.: Georgia tech egocentric activity datasets, Www (2011). [Online]. http://www.cbi.gatech.edu/fpv/. Accessed 10 Aug 2019

  166. Fathi, A., Ren, X., Rehg, J.: Learning to recognize daily actions using gaze. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7572, no. PART 1, pp. 314–327. LNCS (2012)

  167. Calway, A., Mayol-Cuevas, W., Damen, D., Haines, O., Leelasawassuk, T.: Discovering Task relevant objects and their modes of interaction from multi-user egocentric video. In: BMVC, pp. 30.1–30.13 (2015)

  168. Calway, A., Mayol-Cuevas, W., Damen, D., Teesid, L., Osian, H., Michael, W., Davide M.: Bristol egocentric object interactions dataset, Www (2014). [Online]. http://data.bris.ac.uk/data/dataset/o4hx7jnmfqt01lyzf2n4rchg6. Accessed: 10 Aug 2019

  169. Iwashita, Y., Takamine, A., Kurazume, R., Ryoo, M.S.: First-person animal activity recognition from egocentric videos. In: Proc.—Int. Conf. Pattern Recognit., no. i, pp. 4310–4315 (2014)

  170. Iwashita, Y., Takamine, A., Kurazume, R., Ryoo, M.S.: Dogcentric activity dataset, Www (2014). [Online]. http://robotics.ait.kyushu-u.ac.jp/~yumi/db/first_dog.html. Accessed 10 Aug 2019

  171. Li, Y., Ye, Z., Rehg, J.M.: Delving into egocentric actions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07, no. 1, pp. 287–295 (2015)

  172. Ohnishi, K., Kanehira, A., Kanezaki, A., Harada, T.: Recognizing activities of daily living with a wrist-mounted camera. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, no. c, pp. 3103–3111 (2016)

  173. Ohnishi, K., Kanehira, A., Kanezaki, A., Harada, T.: Recognizing activities of daily living with a wrist-mounted camera, Www (2015). [Online]. https://www.mi.t.u-tokyo.ac.jp/static/projects/miladl/. Accessed 25 Mar 2019

  174. Singh, S., Arora, C., Jawahar, C.V.: Trajectory aligned features for first person action recognition. Pattern Recognit. 62, 45–55 (2017)

    Article  Google Scholar 

  175. Singh, S., Arora, C., Jawahar, C.V.: PR 2016 paper—trajectory aligned features for first person action recognition, Www (2017). [Online]. http://cvit.iiit.ac.in/research/projects/cvit-projects/first-person-action-recognition. Accessed 10 Aug 2019

  176. Sigurdsson, G.A., Gupta, A., Schmid, C., Farhadi, A., Alahari, K.: Charades-ego: a large-scale dataset of paired third and first person videos, pp. 1–3 (2018)

  177. Moltisanti, D. et al.: EPIC-Kitchens 2019, Www (2019). [Online]. https://epic-kitchens.github.io/2018. Accessed 25 Mar 2019

  178. Abebe, G., Catala, A., Cavallaro, A.: A first-person vision dataset of office activities. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11377 LNAI, pp. 27–37 (2019)

  179. Abebe, G., Catala, A., Cavallaro, A.: A first-person vision dataset of office activities, www (2019). [Online]. http://www.eecs.qmul.ac.uk/~andrea/fpvo.html. Accessed 10 Aug 2019

  180. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, no. i, pp. 1933–1941 (2016)

  181. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2017, pp. 5534–5542 (2017)

  182. Wang, L. et al.: Temporal segment networks: towards good practices for deep action recognition. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9912, pp. 20–36. LNCS (2016)

  183. Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., Gould, S.: Dynamic image networks for action recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 3034–3042 (2016)

  184. Sun, C., Shrivastava, A., Vondrick, C., Murphy, K., Sukthankar, R., Schmid, C.: Actor-centric relation network. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11215 LNCS, pp. 335–351 (2018)

  185. ViPER: the video performance evaluation resource. WWW. [Online]. http://viper-toolkit.sourceforge.net/. Accessed 25 Mar 2019

  186. Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in LSTMs for activity detection and early detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition., vol. 2016, pp. 1942–1950 (2016)

  187. Zhu, C. et al.: Fine-grained video categorization with redundancy reduction attention. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11209, pp. 139–155. LNCS (2018)

  188. Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition (2018).  arXiv:1812.03982

  189. Baradel, F., Neverova, N., Wolf, C., Mille, J., Mori, G.: Object level visual reasoning in videos. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11217 LNCS, pp. 106–122 (2018)

  190. Piergiovanni, A., Ryoo, M.S.: Unseen action recognition with multimodal learning, pp. 1–17 (2018).   arXiv:1806.08251

  191. Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: British Machine Vision Conference, BMVC 2010—Proceedings, pp. 97.1–97.11 (2010)

  192. Guo, G., Lai, A.: A survey on still image based human action recognition. Pattern Recognit. 47(10), 3343–3361 (2014)

    Article  Google Scholar 

  193. Delaitre, V., Ivan, L., Sivic, J.: Human action classification in still images. WWW (2010). [Online]. https://www.di.ens.fr/willow/research/stillactions. Accessed: 25 Mar 2019

  194. Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: Proceedings of the IEEE Computer Vision and Pattern Recognition, pp. 9–16, 2010

  195. Yao, B., Li, F.F., Khosla, A.: People playing musical instrument (PPMI). WWW (2010). [Online]. http://ai.stanford.edu/~bangpeng/ppmi.html. Accessed: 25 Mar 2019

  196. Yao, B., Jiang X., Khosla, A., Lin, A.L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1331–1338 (2011)

  197. Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L., Fei-fei, L.: Stanford 40 actions dataset, WWW (2011) [Online]. Available: http://vision.stanford.edu/Datasets/40actions.html. Accessed 25 Mar 2019

  198. Chao, Y.W., Wang, Z., He, Y., Wang, J., Deng, J.: HICO & HICO-DET benchmarks for recognizing human-object interactions in images.WWW (2015) [Online]. http://www-personal.umich.edu/~ywchao/hico/. Accessed: 25 Mar 2019

  199. Ma, S., Bargal, S.A., Zhang, J., Sigal, L., Sclaroff, S.: Do less and achieve more: training CNNs for action recognition utilizing action images from the web. Pattern Recognit. 68, 334–345 (2017)

    Article  Google Scholar 

  200. Ma, S., Bargal, S.A., Zhang, J., Sigal, L., Sclaroff, S.: BU-action datasets. WWW, 2017. [Online]. http://cs-people.bu.edu/sbargal/BU-action/. Accessed: 25 Mar 2019

  201. Chao, Y.W., Liu, Y., X. Liu, H. Zeng, and J. Deng: Learning to detect human-object interactions. In: Proceedings 2018 IEEE winter conference on applications of computer vision, WACV 2018, vol. 2018, pp. 381–389 (2018)

  202. Fei-Fei, L., Dong, W., Deng, J., Li, K., Socher, R., Li, L.-J.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roshan Singh.

Additional information

Communicated by L. Zhang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, R., Sonawane, A. & Srivastava, R. Recent evolution of modern datasets for human activity recognition: a deep survey. Multimedia Systems 26, 83–106 (2020). https://doi.org/10.1007/s00530-019-00635-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-019-00635-7

Keywords

Navigation