Skip to main content
Log in

Robust relative attributes for human action recognition

  • Industrial and Commercial Application
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

High-level semantic feature is important to recognize human action. Recently, relative attributes, which are used to describe relative relationship, have been proposed as one of high-level semantic features and have shown promising performance. However, the training process is very sensitive to noises and moreover it is not robust to zero-shot learning. In this paper, to overcome these drawbacks, we propose a robust learning framework using relative attributes for human action recognition. We simultaneously add Sigmoid and Gaussian envelops into the loss objective. In this way, the influence of outliers will be greatly reduced in the process of optimization, thus improving the accuracy. In addition, we adopt Gaussian Mixture models for better fitting the distribution of actions in rank score space. Correspondingly, a novel transfer strategy is proposed to evaluate the parameters of Gaussian Mixture models for unseen classes. Our method is verified on three challenging datasets (KTH, UIUC and HOLLYWOOD2), and the experimental results demonstrate that our method achieves better results than previous methods in both zero-shot classification and traditional recognition task for human action recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: IEEE conference on computer vision (ICCV), pp 2556–2563

  2. Aggarwal JK, Cai Q (1997) Human motion analysis: a review. In: IEEE nonrigid and articulated motion workshop, pp 90–102

  3. Yilmaz A, Shah M (2005) Actions sketch: a novel action representation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 984–989

  4. Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: IEEE conference on computer vision (ICCV), pp 1395–1402

  5. Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: IEEE conference on computer vision (ICCV), pp 444–451

  6. Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searching. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8

  7. Efros A, Berg A, Mori G, Malik J (2003) Recognizing action at a distance. In: IEEE conference on computer vision (ICCV), pp 726–733

  8. Raptis M, Soatto S (2010) Tracklet descriptors for action modeling and video analysis. In: European Conference on Computer Vision (ECCV) pp 577–590

  9. Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE conference on computer vision workshop on visual surveillance and performance evaluation of tracking and surveillance (VS-PETS), pp 65–72

  10. Liu J, Shah M (2008) Learning human actions via information maximization. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8

  11. Liu J, Yang Y, Shah M (2009) Learning semantic visual vocabularies using diffusion distance. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 461–468

  12. Zhang Z, Wang C, Xiao B, Zhou W, Liu S (2012) Action recognition using context-constrained linear coding. IEEE Signal Process Lett 19(7):439–442

    Article  Google Scholar 

  13. Savarese S, DelPozo A, Niebles JC, Fei-Fei L (2008) Spatial-Temporal correlatons for unsupervised action classification. In: IEEE workshop on Motion and Video Computing (WMVC), pp 1–8

  14. Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: IEEE conference on computer vision (ICCV), pp 1593–1600

  15. Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2046–2053

  16. Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1778–1785

  17. Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 951–958

  18. Parikh D, Grauman K (2011) Relative attributes. In: IEEE conference on computer vision (ICCV), pp 503–510

  19. Liu J, Kuipers B, Savarese S (2011) Recognizing human actions by attributes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3337–3344

  20. Kumar N, Berg AC, Belhumeur PN, Nayar SK (2009) Attribute and simile classifiers for face verification, In: IEEE conference on computer vision (ICCV), pp 365–372

  21. Wang Y, Mori G (2010) A discriminative latent model of object classes and attributes. In: European Conference on Computer Vision (ECCV), pp 155–168

  22. Hwang SJ, Sha F, Grauman K (2011) Sharing features between objects and their attributes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1761–1768

  23. Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272

    Article  Google Scholar 

  24. Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pp 339–348

  25. Berg T, Berg A, Shih J (2010) Automatic attribute discovery and characterization from noisy web data. In: European Conference on Computer Vision (ECCV), pp 663–676

  26. Elsas JL, Carvalho VR, Carbonell JG (2008) Fast learning of document ranking functions with the committee perceptron. In: ACM conference on web search and data mining (WSDM), pp 55–64

  27. Perez-Cruz F, Navia-Vazquez A, Figueiras-Vidal AR, Artes-Rodriguez A (2008) Empirical risk minimization for support vector classifiers. IEEE Trans Neural Netw 14(2):296–303

    Article  Google Scholar 

  28. Larochelle H, Erhan D, Bengio Y (2008) Zero-data learning of new tasks. In: AAAI Conference on Artificial Intelligence (AAAI), pp 646–651

  29. Laptev I (2005) On space-time interest points. Int J Comput Vis (IJCV) 64(2):107–123

    Article  MathSciNet  Google Scholar 

  30. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 886–893

  31. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision (ECCV), pp 428–441

  32. Wang H, Ullah MM, Klaser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference (BMVC)

  33. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27

    Article  Google Scholar 

  34. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local SVM approach. In: International Conference on Pattern Recognition (ICPR), pp 32–36

  35. Tran D, Sorokin A (2008) Human activity recognition with metric learning. In: European Conference on Computer Vision (ECCV), pp 548–561

  36. Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2929–2936

  37. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8

  38. Wang J, Chen Z, Wu Y (2011) Action recognition with multiscale spatio-temporal contexts. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3185–3192

  39. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition In: IEEE Transactions on Pattern Analysis and Machine Intelligence

  40. Han D, Bo L, Sminchisescu C (2009) Selection and context for action recognition. In: IEEE conference on computer vision (ICCV), pp 1933–1940

  41. Gilbert A, Illingworth J, Bowden R (2009) Fast realistic multi-action recognition using mined dense spatio-temporal features. In: IEEE conference on computer vision (ICCV), pp 925–931

  42. Ullah M, Parizi S, Laptev I (2010) Improving bag-of-features action recognition with non-local cues. In: British Machine Vision Conference (BMVC)

  43. Chakraborty B, Holte M, Moeslund T, Gonzà àlez J (2012) Selective spatio-temporal interest points. Comput Vis Image Underst 116(3):396–410

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) under Grants No. 60933010, No. 61172103 and No. 61271429 and National High-tech R&D Program of China (863 Program) under Grant No. 2012AA041312.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunheng Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Wang, C., Xiao, B. et al. Robust relative attributes for human action recognition. Pattern Anal Applic 18, 157–171 (2015). https://doi.org/10.1007/s10044-013-0349-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-013-0349-3

Keywords

Navigation