Skip to main content
Log in

Efficient 2D viewpoint combination for human action recognition

  • Industrial and Commercial Application
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The ability to recognize human actions using a single viewpoint is affected by phenomena such as self-occlusions or occlusions by other objects. Incorporating multiple cameras can help overcome these issues. However, the question remains how to efficiently use information from all viewpoints to increase performance. Researchers have reconstructed a 3D model from multiple views to reduce dependency on viewpoint, but this 3D approach is often computationally expensive. Moreover, the quality of each view influences the overall model and the reconstruction is limited to volumes where the views overlap. In this paper, we propose a novel method to efficiently combine 2D data from different viewpoints. Spatio-temporal features are extracted from each viewpoint and then used in a bag-of-words framework to form histograms. Two different sizes of codebook are exploited. The similarity between the obtained histograms is represented via the Histogram Intersection kernel as well as the RBF kernel with \(\chi ^2\) distance. Lastly, we combine all the basic kernels generated by selection of different viewpoints, feature types, codebook sizes and kernel types. The final kernel is a linear combination of basic kernels that are properly weighted based on an optimization process. For higher accuracy, the sets of kernel weights are computed separately for each binary SVM classifier. Our method not only combines the information from multiple viewpoints efficiently, but also improves the performance by mapping features into various kernel spaces. The efficiency of the proposed method is demonstrated by testing on two commonly used multi-view human action datasets. Moreover several experiments indicate the efficacy of each part of the method on the overall performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Note that \(\alpha _i\ne 0\) only for support vectors.

  2. The dataset is accessible via http://4drepository.inrialpes.fr/public/viewgroup/6.

  3. The actions are standing still, clapping, waving one arm, waving two arms, punching, jogging, jumping jack, kicking, bending and bowling.

References

  1. Ashraf N, Sun C, Foroosh H (2014) View invariant action recognition using projective depth. Comput Vis Image Underst 123:41–52

    Article  Google Scholar 

  2. Atrey PK, Hossain MA, Saddik AE, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16:345–379

    Article  Google Scholar 

  3. Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: International conference on image processing

  4. Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522

    Article  Google Scholar 

  5. Bobick A, Davis J (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267

    Article  Google Scholar 

  6. Wu C, Khalili AH, Aghajan H (2010) Multiview activity recognition in smart homes with spatio-temporal features. In: ACM/IEEE international conference on distributed smart cameras (2010)

  7. Cai Z, Wang L, Peng X, Qiao Y (2014) Multi-view super vector for action recognition. In: IEEE international conference on computer vision and pattern recognition (CVPR)

  8. Chapelle O, Haffner P, Vapnik VN (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064

    Article  Google Scholar 

  9. Cheng SY, Trivedi MM (2007) Articulated human body pose inference from voxel data using a kinematically constrained gaussian mixture model. In: CVPR Workshops

  10. Cortes C, Gretton A, Lanckriet G, Mohri M, Rostamizadeh A (2008) Automatic selection of optimal kernels. In: Proceedings of the NIPS workshop on Kernel learning

  11. Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: International workshop on performance evaluation of tracking and surveillance, ICCV

  12. Farhadi A, Tabrizi M (2008) Learning to recognize activities from the wrong view point. In: European conference on computer vision (ECCV)

  13. Fu H, Qiu G, He H (2011) Feature combination beyond basic arithmetics. In: British machine vision conference (BMVC)

  14. Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: International conference on computer vision (ICCV)

  15. Gkalelis N, Nikolaidis N, Pitas I (2009) View indepedent human movement recognition from multi-view video exploiting a circular invariant posture representation. In: IEEE international conference on multimedia and expo

  16. Holte MB, Moeslund T, Nikolaidis N, Pitas I (2011) 3D human action recognition for multi-view camera systems. In: International conference on 3D imaging, modeling, processing, visualization and transmission (3DIMPVT)

  17. Holte MB, Tran C, Trivedi MM, Moeslund TB (2012) Human pose estimation and activity recognition from multi-view videos: comparative explorations of recent developments. IEEE J Sel Top Sign Process 6(5):538–552

    Article  Google Scholar 

  18. Huang P, Hilton A, Starck J (2010) Shape similarity for 3d video sequences of people. Int J Comput Vis 89(2–3):362–381

    Article  Google Scholar 

  19. Jhuo IH, Lee DT (2010) Boosted multiple kernel learning for scene category recognition. In: International conference on pattern recognition (ICPR)

  20. Junejo IN, Dexter E, Laptev I, Pérez P (2008) Cross-view action recognition from temporal self-similarities. In: European conference on computer vision (ECCV)

  21. Kloft M, Brefeld U, Sonnenburg S, Zien A (2011) Lp-norm multiple kernel learning. J Mach Learn Res 12:953–997

    MathSciNet  MATH  Google Scholar 

  22. Laptev I (2005) On space-time interest points. Int. J. Comput Vis 64(2):107–123

    Article  MathSciNet  Google Scholar 

  23. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE international conference on computer vision and pattern recognition (CVPR)

  24. Liu J, Ali S, Shah M (2008) Recognizing human actions using multiple features. In: IEEE international conference on computer vision and pattern recognition (CVPR)

  25. Liu J, Shah M (2008) Learning human action via information maximization. In: IEEE international conference on computer vision and pattern recognition (CVPR)

  26. Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searching. In: IEEE international conference on computer vision and pattern recognition (CVPR)

  27. Maji S, Berg A, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: IEEE international conference on computer vision and pattern recognition (CVPR)

  28. Matikainen P, Pillai P, Mummert L, Sukthankar R, Hebert M (2011) Prop-free pointing detection in dynamic cluttered environments. In: IEEE international conference on automatic face and gesture recognition and workshops

  29. Naiel M, Abdelwahab M, El-Saban M (2011) Multi-view human action recognition system employing 2dpca. In: Workshop on applications of computer vision (WACV)

  30. Pehlivan S, Duygulu P (2010) A new pose-based representation for recognizing actions from multiple cameras. Comput Vis Image Underst 115:140–151

    Article  Google Scholar 

  31. Pehlivan S, Forsyth DA (2014) Recognizing activities in multiple views with fusion of frame judgments. Image Vis Comput 32(4):237–249

    Article  Google Scholar 

  32. Peng B, Qian G (2011) Online gesture spotting from visual hull data. IEEE Trans Pattern Anal Mach Intell 33(6):1175–1188

    Article  Google Scholar 

  33. Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. In: International conference on machine learning (ICML)

  34. Ramagiri S, Kavi R, Kulathumani V (2011) Real-time multi-view action recognition using a wireless camera network. In: ACM/IEEE international conference on distributed smart cameras

  35. Reddy K, Liu J, Shah M (2009) Incremental action recognition using feature-tree. In: International conference on computer vision (ICCV)

  36. Song Y, Demirdjian D, Davis R (2011) Multi-signal gesture recognition using temporal smoothing hidden conditional random fields. In: IEEE international conference on automatic face and gesture recognition and workshops

  37. Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565

    MathSciNet  MATH  Google Scholar 

  38. Souvenir R, Babbs J (2008) Learning the viewpoint manifold for action recognition. In: IEEE international conference on computer vision and pattern recognition (CVPR)

  39. Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038

    Article  Google Scholar 

  40. Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32

    Article  Google Scholar 

  41. Turaga P, Veeraraghavan A, Chellappa R (2008) Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: IEEE international conference on computer vision and pattern recognition (CVPR)

  42. Varma M, Ray D (2007) Learning the discriminative power-invariance trade-off. In: International conference on computer vision (ICCV)

  43. Veeraraghavan A, Srivastava A, Roy-Chowdhury A, Chellappa R (2009) Rate-invariant recognition of humans and their activities. IEEE Trans Image Process 18(6):1326–1339

    Article  MathSciNet  Google Scholar 

  44. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE international conference on computer vision and pattern recognition (CVPR)

  45. Vitaladevuni S, Kellokumpu V, Davis L (2008) Action recognition using ballistic dynamics. In: IEEE international conference on computer vision and pattern recognition (CVPR)

  46. Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3D exemplars. In: International conference on computer vision (ICCV)

  47. Weinland D, Özuysal M, Fua P (2010) Making action recognition robust to occlusions and viewpoint changes. In: European conference on computer vision (ECCV)

  48. Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104:249–257

    Article  Google Scholar 

  49. Yan P, Khan S, Shah M (2008) Learning 4d action feature models for arbitrary view action recognition. In: IEEE international conference on computer vision and pattern recognition (CVPR)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepu Rajan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saghafi, B., Rajan, D. & Li, W. Efficient 2D viewpoint combination for human action recognition. Pattern Anal Applic 19, 563–577 (2016). https://doi.org/10.1007/s10044-016-0537-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-016-0537-z

Keywords

Navigation