Skip to main content

Advertisement

Log in

MMA: a multi-view and multi-modality benchmark dataset for human action recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Human action recognition is an active research topic in both computer vision and machine learning communities, which has broad applications including surveillance, biometrics and human computer interaction. In the past decades, although some famous action datasets have been released, there still exist limitations, including the limited action categories and samples, camera views and variety of scenarios. Moreover, most of them are designed for a subset of the learning problems, such as single-view learning problem, cross-view learning problem and multi-task learning problem. In this paper, we introduce a multi-view, multi-modality benchmark dataset for human action recognition (abbreviated to MMA). MMA consists of 7080 action samples from 25 action categories, including 15 single-subject actions and 10 double-subject interactive actions in three views of two different scenarios. Further, we systematically benchmark the state-of-the-art approaches on MMA with respective to all three learning problems by different temporal-spatial feature representations. Experimental results demonstrate that MMA is challenging on all three learning problems due to significant intra-class variations, occlusion issues, views and scene variations, and multiple similar action categories. Meanwhile, we provide the baseline for the evaluation of existing state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75

    Article  MathSciNet  Google Scholar 

  2. Chen G (2015) Human action recognition via multi-task learning base on spatial-temporal feature. Elsevier Science Inc, pp 418–428

  3. Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: IEEE international conference on image processing, pp 168–172

  4. Cheng Z, Qin L, Ye Y, Huang Q, Qi T (2012) Human daily action analysis with multi-view and color-depth data. In: International conference on computer vision, pp 52–61

    Chapter  Google Scholar 

  5. Evgeniou T, Pontil M (2004) Regularized multi–task learning. In: Tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 109–117

  6. Gao Z, Zhang H, Xu GP, Xue YB, Hauptmann AG (2015) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112(C):83–97

    Article  Google Scholar 

  7. Gao Z, Nie W, Liu A, Zhang H (2016) Evaluation of local spatial-temporal features for cross-view action recognition. Neurocomputing 173(P1):110–117

    Article  Google Scholar 

  8. Gao Z, Li SH, Zhu YJ, Wang C, Zhang H (2017) Collaborative sparse representation leaning model for rgbd action recognition. J Vis Commun Image Represent

  9. Gao Z, Li SH, Zhang GT, Zhu YJ, Wang C, Zhang H (2017) Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition. Multimedia Tools and Applications 76(19):1–24

    Article  Google Scholar 

  10. Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253

    Article  Google Scholar 

  11. Han Y, Wu F, Tao D, Shao J, Zhuang Y, Jiang J (2012) Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans Circuits Syst Video Technol 22(10):1485–1496

    Article  Google Scholar 

  12. Han Y, Yang Y, Wu F, Hong R (2015) Compact and discriminative descriptor inference using multi-cues. IEEE Trans Image Process 24(12):5114–5126

    Article  MathSciNet  Google Scholar 

  13. Han Y, Yang Y, Yan Y, Ma Z, Sebe N, Zhou X (2015) Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Netw Learn Syst 26(2):252

    Article  MathSciNet  Google Scholar 

  14. He X, Kan MY, Xie P, Chen X (2014) Comment-based multi-view clustering of web 2.0 items. In: International conference on World Wide Web, pp 771–782

  15. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: CVPR

  16. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb a large video database for human motion recognition. In: IEEE international conference on computer vision, ICCV 2011, Barcelona, pp 2556–2563

  17. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR 2008. IEEE conference on computer vision and pattern recognition, 2008, pp 1–8

  18. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: Computer vision and pattern recognition workshops, pp 9–14

  19. Li Z, Liu J, Tang J, Lu H (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098

    Article  Google Scholar 

  20. Lin L, Wang K, Zuo W, Wang M, Luo J, Zhang L (2015) A deep structured model with radius–margin bound for 3d human activity recognition. Int J Comput Vis 118(2):256–273

    Article  MathSciNet  Google Scholar 

  21. Lin L, Wang K, Zuo W, Wang M, Luo J, Zhang L (2016) A deep structured model with radius–margin bound for 3d human activity recognition. Int J Comput Vis 118(2):256–273

    Article  MathSciNet  Google Scholar 

  22. Liu AA, Su YT, Nie WZ, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114

    Article  Google Scholar 

  23. Liu AA, Xu N, Nie WZ, Su YT, Wong Y, Kankanhalli M (2017) Benchmarking a multimodal and multiview and interactive dataset for human action recognition. IEEE Transactions on Cybernetics 47(7):1781–1794

    Article  Google Scholar 

  24. Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. pp 1996–2003

  25. Marszałek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE conference on computer vision pattern recognition

  26. Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009, pp 2929–2936

  27. Rahmani H, Mian A (2016) 3d action recognition from novel viewpoints. In: Computer vision and pattern recognition, pp 1506–1515

  28. Rahmani H, Mahmood A, Du QH, Mian A (2014) HOPC: Histogram of oriented principal components of 3d pointclouds for action recognition. In: European conference on computer vision, pp 742–757

    Google Scholar 

  29. Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981

    Article  Google Scholar 

  30. Ren T, Qiu Z, Liu Y, Yu T, Bei J (2015) Soft-assigned bag of features for object tracking. Multimedia Systems 21(2):189–205

    Article  Google Scholar 

  31. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: International conference on pattern recognition, vol 3, pp 32–36

  32. Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb + d: a large scale dataset for 3d human activity analysis. pp 1010–1019

  33. Soomro K, Zamir AR (2014) Action recognition in realistic sports videos. Springer International Publishing, pp 408–411

  34. Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. Computer Science

  35. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2014) Learning spatiotemporal features with 3d convolutional networks. pp 4489–4497

  36. Wang H, Schmid C (2014) Action recognition with improved trajectories. In: IEEE international conference on computer vision, pp 3551–3558

  37. Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79

    Article  MathSciNet  Google Scholar 

  38. Wang J, Nie X, Xia Y, Wu Y, Zhu SC (2014) Cross-view action modeling, learning, and recognition. In: IEEE conference on computer vision and pattern recognition, pp 2649–2656

  39. Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104(2):249–257

    Article  Google Scholar 

  40. Yang Y, Ma Z, Hauptmann AG, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans Multimedia 15 (3):661–669

    Article  Google Scholar 

  41. Yuan J, Wu Y, Liu Z, Wang J (2014) Mining actionlet ensemble for action recognition with depth cameras. IEEE Trans Softw Eng 36(5):914–927

    Google Scholar 

  42. Zhang H, Zha Z-J, Yang Y, Yan S, Chua T-S (2014) Robust (semi) nonnegative graph embedding. IEEE Trans Image Process 23(7):2996?-3012

    Article  MathSciNet  Google Scholar 

  43. Zheng J, Jiang Z, Chellappa R (2016) Cross-view action recognition via transferable dictionary learning. IEEE press, p 2542

  44. Zhou Q, Wang G, Jia K, Qi Z (2014) Learning to share latent tasks for action recognition. In: IEEE international conference on computer vision, pp 2264–2271

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zan Gao.

Additional information

This work was supported in part by the National Natural Science Foundation of China (No.61572357, No.61202168), Tianjin Municipal Natural Science Foundation (No.14JCZDJC31700, No.13JCQNJC0040).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Z., Han, Tt., Zhang, H. et al. MMA: a multi-view and multi-modality benchmark dataset for human action recognition. Multimed Tools Appl 77, 29383–29404 (2018). https://doi.org/10.1007/s11042-018-5833-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5833-8

Keywords

Navigation