Efficient 2D viewpoint combination for human action recognition

Saghafi, Behrouz; Rajan, Deepu; Li, Wanqing

doi:10.1007/s10044-016-0537-z

Efficient 2D viewpoint combination for human action recognition

Industrial and Commercial Application
Published: 05 March 2016

Volume 19, pages 563–577, (2016)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Behrouz Saghafi¹,
Deepu Rajan² &
Wanqing Li³

324 Accesses
7 Citations
Explore all metrics

Abstract

The ability to recognize human actions using a single viewpoint is affected by phenomena such as self-occlusions or occlusions by other objects. Incorporating multiple cameras can help overcome these issues. However, the question remains how to efficiently use information from all viewpoints to increase performance. Researchers have reconstructed a 3D model from multiple views to reduce dependency on viewpoint, but this 3D approach is often computationally expensive. Moreover, the quality of each view influences the overall model and the reconstruction is limited to volumes where the views overlap. In this paper, we propose a novel method to efficiently combine 2D data from different viewpoints. Spatio-temporal features are extracted from each viewpoint and then used in a bag-of-words framework to form histograms. Two different sizes of codebook are exploited. The similarity between the obtained histograms is represented via the Histogram Intersection kernel as well as the RBF kernel with \(\chi ^2\) distance. Lastly, we combine all the basic kernels generated by selection of different viewpoints, feature types, codebook sizes and kernel types. The final kernel is a linear combination of basic kernels that are properly weighted based on an optimization process. For higher accuracy, the sets of kernel weights are computed separately for each binary SVM classifier. Our method not only combines the information from multiple viewpoints efficiently, but also improves the performance by mapping features into various kernel spaces. The efficiency of the proposed method is demonstrated by testing on two commonly used multi-view human action datasets. Moreover several experiments indicate the efficacy of each part of the method on the overall performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Note that \(\alpha _i\ne 0\) only for support vectors.
The dataset is accessible via http://4drepository.inrialpes.fr/public/viewgroup/6.
The actions are standing still, clapping, waving one arm, waving two arms, punching, jogging, jumping jack, kicking, bending and bowling.

References

Ashraf N, Sun C, Foroosh H (2014) View invariant action recognition using projective depth. Comput Vis Image Underst 123:41–52
Article Google Scholar
Atrey PK, Hossain MA, Saddik AE, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16:345–379
Article Google Scholar
Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: International conference on image processing
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522
Article Google Scholar
Bobick A, Davis J (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Article Google Scholar
Wu C, Khalili AH, Aghajan H (2010) Multiview activity recognition in smart homes with spatio-temporal features. In: ACM/IEEE international conference on distributed smart cameras (2010)
Cai Z, Wang L, Peng X, Qiao Y (2014) Multi-view super vector for action recognition. In: IEEE international conference on computer vision and pattern recognition (CVPR)
Chapelle O, Haffner P, Vapnik VN (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064
Article Google Scholar
Cheng SY, Trivedi MM (2007) Articulated human body pose inference from voxel data using a kinematically constrained gaussian mixture model. In: CVPR Workshops
Cortes C, Gretton A, Lanckriet G, Mohri M, Rostamizadeh A (2008) Automatic selection of optimal kernels. In: Proceedings of the NIPS workshop on Kernel learning
Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: International workshop on performance evaluation of tracking and surveillance, ICCV
Farhadi A, Tabrizi M (2008) Learning to recognize activities from the wrong view point. In: European conference on computer vision (ECCV)
Fu H, Qiu G, He H (2011) Feature combination beyond basic arithmetics. In: British machine vision conference (BMVC)
Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: International conference on computer vision (ICCV)
Gkalelis N, Nikolaidis N, Pitas I (2009) View indepedent human movement recognition from multi-view video exploiting a circular invariant posture representation. In: IEEE international conference on multimedia and expo
Holte MB, Moeslund T, Nikolaidis N, Pitas I (2011) 3D human action recognition for multi-view camera systems. In: International conference on 3D imaging, modeling, processing, visualization and transmission (3DIMPVT)
Holte MB, Tran C, Trivedi MM, Moeslund TB (2012) Human pose estimation and activity recognition from multi-view videos: comparative explorations of recent developments. IEEE J Sel Top Sign Process 6(5):538–552
Article Google Scholar
Huang P, Hilton A, Starck J (2010) Shape similarity for 3d video sequences of people. Int J Comput Vis 89(2–3):362–381
Article Google Scholar
Jhuo IH, Lee DT (2010) Boosted multiple kernel learning for scene category recognition. In: International conference on pattern recognition (ICPR)
Junejo IN, Dexter E, Laptev I, Pérez P (2008) Cross-view action recognition from temporal self-similarities. In: European conference on computer vision (ECCV)
Kloft M, Brefeld U, Sonnenburg S, Zien A (2011) Lp-norm multiple kernel learning. J Mach Learn Res 12:953–997
MathSciNet MATH Google Scholar
Laptev I (2005) On space-time interest points. Int. J. Comput Vis 64(2):107–123
Article MathSciNet Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE international conference on computer vision and pattern recognition (CVPR)
Liu J, Ali S, Shah M (2008) Recognizing human actions using multiple features. In: IEEE international conference on computer vision and pattern recognition (CVPR)
Liu J, Shah M (2008) Learning human action via information maximization. In: IEEE international conference on computer vision and pattern recognition (CVPR)
Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searching. In: IEEE international conference on computer vision and pattern recognition (CVPR)
Maji S, Berg A, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: IEEE international conference on computer vision and pattern recognition (CVPR)
Matikainen P, Pillai P, Mummert L, Sukthankar R, Hebert M (2011) Prop-free pointing detection in dynamic cluttered environments. In: IEEE international conference on automatic face and gesture recognition and workshops
Naiel M, Abdelwahab M, El-Saban M (2011) Multi-view human action recognition system employing 2dpca. In: Workshop on applications of computer vision (WACV)
Pehlivan S, Duygulu P (2010) A new pose-based representation for recognizing actions from multiple cameras. Comput Vis Image Underst 115:140–151
Article Google Scholar
Pehlivan S, Forsyth DA (2014) Recognizing activities in multiple views with fusion of frame judgments. Image Vis Comput 32(4):237–249
Article Google Scholar
Peng B, Qian G (2011) Online gesture spotting from visual hull data. IEEE Trans Pattern Anal Mach Intell 33(6):1175–1188
Article Google Scholar
Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. In: International conference on machine learning (ICML)
Ramagiri S, Kavi R, Kulathumani V (2011) Real-time multi-view action recognition using a wireless camera network. In: ACM/IEEE international conference on distributed smart cameras
Reddy K, Liu J, Shah M (2009) Incremental action recognition using feature-tree. In: International conference on computer vision (ICCV)
Song Y, Demirdjian D, Davis R (2011) Multi-signal gesture recognition using temporal smoothing hidden conditional random fields. In: IEEE international conference on automatic face and gesture recognition and workshops
Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565
MathSciNet MATH Google Scholar
Souvenir R, Babbs J (2008) Learning the viewpoint manifold for action recognition. In: IEEE international conference on computer vision and pattern recognition (CVPR)
Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038
Article Google Scholar
Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32
Article Google Scholar
Turaga P, Veeraraghavan A, Chellappa R (2008) Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: IEEE international conference on computer vision and pattern recognition (CVPR)
Varma M, Ray D (2007) Learning the discriminative power-invariance trade-off. In: International conference on computer vision (ICCV)
Veeraraghavan A, Srivastava A, Roy-Chowdhury A, Chellappa R (2009) Rate-invariant recognition of humans and their activities. IEEE Trans Image Process 18(6):1326–1339
Article MathSciNet Google Scholar
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE international conference on computer vision and pattern recognition (CVPR)
Vitaladevuni S, Kellokumpu V, Davis L (2008) Action recognition using ballistic dynamics. In: IEEE international conference on computer vision and pattern recognition (CVPR)
Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3D exemplars. In: International conference on computer vision (ICCV)
Weinland D, Özuysal M, Fua P (2010) Making action recognition robust to occlusions and viewpoint changes. In: European conference on computer vision (ECCV)
Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104:249–257
Article Google Scholar
Yan P, Khan S, Shah M (2008) Learning 4d action feature models for arbitrary view action recognition. In: IEEE international conference on computer vision and pattern recognition (CVPR)

Download references

Author information

Authors and Affiliations

Centre for Multimedia and Network Technology, School of Computer Engineering, Nanyang Technological University, Singapore, 639798, Singapore
Behrouz Saghafi
School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
Deepu Rajan
Information and Communication Technology (ICT) Research Institute, University of Wollongong, Wollongong, NSW, 2522, Australia
Wanqing Li

Authors

Behrouz Saghafi
View author publications
You can also search for this author in PubMed Google Scholar
Deepu Rajan
View author publications
You can also search for this author in PubMed Google Scholar
Wanqing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepu Rajan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saghafi, B., Rajan, D. & Li, W. Efficient 2D viewpoint combination for human action recognition. Pattern Anal Applic 19, 563–577 (2016). https://doi.org/10.1007/s10044-016-0537-z

Download citation

Received: 14 July 2014
Accepted: 23 February 2016
Published: 05 March 2016
Issue Date: May 2016
DOI: https://doi.org/10.1007/s10044-016-0537-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient 2D viewpoint combination for human action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-scale affined-HOF and dimension selection for view-unconstrained action recognition

Temporal Self-Similarity for Appearance-Based Action Recognition in Multi-View Setups

Multi-view Recognition Using Weighted View Selection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient 2D viewpoint combination for human action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-scale affined-HOF and dimension selection for view-unconstrained action recognition

Temporal Self-Similarity for Appearance-Based Action Recognition in Multi-View Setups

Multi-view Recognition Using Weighted View Selection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation