Classifying actions based on histogram of oriented velocity vectors

Boubou, Somar; Suzuki, Einoshin

doi:10.1007/s10844-014-0329-0

Classifying actions based on histogram of oriented velocity vectors

Published: 22 July 2014

Volume 44, pages 49–65, (2015)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Somar Boubou¹ &
Einoshin Suzuki²

677 Accesses
22 Citations
Explore all metrics

Abstract

We present a new descriptor for activity recognition from skeleton data acquired by Kinect. Previous approaches tend to employ complex descriptors which require extensively long computation time. In this study, we present an efficient and effective descriptor which we name as Histogram-of-Oriented-Velocity-Vectors (HOVV). It is a scale-invariant, speed-invariant and length-invariant descriptor for human actions represented by 3D skeletons acquired by Kinect. We describe the skeleton sequence using 2D spatial histogram capturing the distribution of the orientations of velocity vectors of the joint in a spherical coordinate system. We make use of three methods to classify actions represented by HOVV descriptor. These are k-nearest neighbour classifier, Support Vector Machines classifier and Extreme Learning Machines. For the cases when HOVV descriptor is not sufficient, such as to differentiate actions which involve tiny movement of joints such as “sit-still”, we also incorporate a simple skeleton descriptor as a prior to the action descriptor. Through extensive experiments, we test our system with different configurations. We also demonstrate that our HOVV descriptor outperforms the state-of-the-art methods. The results demonstrate that our descriptor has much shorter computational time due to the simpler computations needed for feature extraction. Moreover our descriptor shows a higher recognition accuracy compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Aggarwal, J., & Ryoo, M. (2011). Human activity analysis: a review. ACM Computing Surveys (CSUR), 43(3), 1–43.
Article Google Scholar
Ahmad, M., & Lee, S. (2006). HMM-based human action recognition using multiview image sequences. In Proceedings of International Conference Pattern Recognition (ICPR), (vol. 1, pp. 263–266).
Chen, X. (2013). Online RGB-D gesture recognition with extreme learning machines. In Proceedings 15th ACM on International conference on multimodal interaction (ICMI), (pp. 467–474).
Chen, X., & Koskela, M. (2013). Classification of RGB-D and motion capture sequences using extreme learning machine. Proc. Scandinavian Conference on Image Analysis, 7944, 640–651.
Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In Proceedings Visual Surveillance and Performance Evaluation of Tracking and Surveillance, (pp. 65–72).
Ellis, C., Masood, S. Z., Tappen, M. F., Laviola, J. J. Jr., Sukthankar, R. (2013). Exploring the trade-off between accuracy and observational latency in action recognition. International Journal of Computer Vision, 101(3), 420–436.
Article Google Scholar
Fengjun, Lv., & Nevatia, R. (2007). Single view human action recognition using key pose matching and viterbi path searching. In Proceedings IEEE Conference Computer Vision and Pattern Recognition (CVPR), (pp. 1–8).
Garage, W. (2008). Turtlebot. http://www.willowgarage.com/turtlebot.
Han, L., Wu, X., Liang, W., Hou, G., Jia, Y. (2010). Discriminative human action recognition in the learned hierarchical manifold space. Image and Vision Computing, 28(5), 836–849.
Article Google Scholar
Huang, G., Zhou, H., Ding, X., Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and CyberneticsPart B Cybernetics, 42(2), 513–529.
Article Google Scholar
Huang, P., Hilton, A., Starck, J. (2010). Shape similarity for 3D video sequences of people. International Journal of Computer Vision (IJCV), 89(2–3), 362–381.
Article Google Scholar
Ikizler, N., & Duygulu, P. (2009). Histogram of oriented rectangles: A new pose descriptor for human action recognition. Image and Vision Computing, 27(10), 1515–1526.
Article Google Scholar
Junejo, I. N., Dexter, E., Laptev, I., Perez, P. (2010). View-independent action recognition from temporal self-similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(1), 172–185.
Article Google Scholar
Karali, A., & ElHelw, M. (2012). Motion history of skeletal volumes and temporal change in bounding volume fusion for human action recognition. In Proceedings first international conference on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction (MPRSS), (pp. 79–87).
Kilner, J., Guillemaut, J.-Y., Hilton, A. (2009). 3D action matching with key-pose detection. In Proceedings IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), (pp. 1–8).
Laptev, I. (2005). On space-time interest points. International Journal of Computer Vision, 64(2–3), 107–123.
Article Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozefeld, B. (2008). Learning realistic human actions from movies. In Proceedings IEEE Conference Computer Vision and Pattern Recognition (CVPR), (pp. 1–8).
Li, W., Zhang, Z., Liu, Z. (2010). Action recognition based on a bag of 3D points. In Proceedings IEEE Conference Computer Vision and Pattern Recognition (CVPR), (pp. 9–14).
Lublinerman, R., Ozay, N., Zarpalas, D., Camps, O. (2006). Activity recognition from silhouettes using linear systems, andmodel (in) validation techniques. In Proceedings International Conference on Pattern Recognition (ICPR), (pp. 347–350).
Martens, J., & Sutskever, I. (2011). Learning recurrent neural networks with Hessian-free optimization. In Proceedings International Conference on Machine Learning (ICML), (pp. 1033–1040).
Oi, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R., Berkeley M. (2013). A comprehensive multimodal human action database. In Proceedings Applications of Computer Vision (WACV), (pp. 53–60).
Oliver, N., Garg, A., Horvitz, E. (2004). Layered representations for learning and inferring office activity from multiple sensory channels. Computer Vision and Image Understanding (CVIU), 96(2), 163– 180.
Article Google Scholar
Oreifej, O., & Liu, Z. (2013). HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In Proceedings IEEE Conference Computer Vision and Pattern Recognition (CVPR), (pp. 716–723).
Oreifej, O. (2013). HON4D: Code and Data . http://www.cs.ucf.edu/~oreifej/HON4D.html.
Pehlivan, S., & Duygulu, P. (2011). A new pose-based representation for recognizing actions from multiple cameras. Computer Vision and Image Understanding (CVIU), 115(2), 140–151.
Article Google Scholar
Quattoni, A., Wang, S., Morency, L., Collins, M., Darrell, T., Csail, M. (2007). Hidden-state conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 29(10), 1848–1852.
Article Google Scholar
Ryoo, M.S. (2011). Human activity prediction: Early recognition of ongoing activities from streaming videos. In Proceedings IEEE International Conference on Computer Vision (ICCV), (pp. 1036–1043).
Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D. (2005). Conditional models for contextual human motion recognition. In Proceedings International Conference on Computer Vision (ICCV), (pp. 1808–1815).
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 1297–1304).
Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R. (2013). Real-time human pose recognition in parts from single depth images. Communications of the ACM, 56(1), 116–124.
Article Google Scholar
Sminchisescu, C., Kanaujia, A., Metaxas, D. (2006). Conditional models for contextual human motion recognition. Computer Vision and Image Understanding (CVIU), 104(2-3), 210–220.
Article Google Scholar
Song, Y., Demirdjian, D., Davis, R. (2011). Multi-signal gesture recognition using temporal smoothing hidden conditional random fields. Proceedings Automatic Face and Gesture Recognition (FG), (pp. 388–393).
Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O. (2008). Machine recognition of human activities: A survey. IEEE Transactions on Circuits and Systems for Video Technology, 18(11), 1473–1488.
Article Google Scholar
Veeraraghavan, A., & Chellappa, R., Roy-Chowdhury, A.K. (2006). The function space of an activity. In Proceedings IEEE Conference. Computer Vision and Pattern Recognition (CVPR), (pp. 959–968).
Veeraraghavan, A., Srivastava, A., Roy-Chowdhury, A., Chellappa, R. (2009). Rate-invariant recognition of humans and their activities. IEEE Transactions on Image Processing (TIP), 18(6), 1326–1339.
Article MathSciNet Google Scholar
Wang, Y., & Mori, G. (2008). Learning a discriminative hidden part model for human action recognition. Advances in Neural Information Processing Systems (NIPS), 21, 1721–1728.
Google Scholar
Wang, Y., & Mori, G. (2009). Max-margin hidden conditional random fields for human action recognition. In Proceedings IEEE Conference Computer Vision and Pattern Recognition (CVPR), (pp. 1–8).
Wang, J., Liu, Z., Wu, Y., Yuan, J. (2012). Mining Actionlet Ensemble for Action Recognition with Depth Cameras. In Proceedings IEEE Conference Computer Vision and Pattern Recognition (CVPR), (pp. 1290–1297).
Wang, L., & Suter, D. (2008). Visual learning and recognition of sequential data manifolds with applications to human movement analysis. Computer Vision and Image Understanding (CVIU), 110(2), 153–172.
Article Google Scholar
Wang, H., Ullah, M. M., Klaser, A., Laptev, I., Schmid, C. (2009). Evaluation of local spatio-temporal features for action recognition. In Proceedings British Machine Vision Conference (BMVC), (pp. 1–11).
Weinland, D., Grenoble, F., Boyer, E., Ronfard, R., Inc, A. (2007). Action recognition from arbitrary views using 3D exemplars. In Proceedings IEEE International Conference on Computer Vision (ICCV), (pp. 1–7).
Xia, L., Chen, C. C., Aggarwal, J. K. (2012). View invariant human action recognition using histograms of 3D joints. In Proceedings CVPR Workshop on Human Activity Understanding from 3D Data (HAU3D), (pp. 20–27).
Yang, Y., Hao, A., Zhao, Q. (2008). View-invariant action recognition using interest points. In Proceedings International Conference Multimedia information retrieval, (pp. 305–312).

Download references

Author information

Authors and Affiliations

Graduate School of Systems Life Sciences, Kyushu University, Fukuoka, Japan
Somar Boubou
Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
Einoshin Suzuki

Authors

Somar Boubou
View author publications
You can also search for this author in PubMed Google Scholar
Einoshin Suzuki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Somar Boubou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boubou, S., Suzuki, E. Classifying actions based on histogram of oriented velocity vectors. J Intell Inf Syst 44, 49–65 (2015). https://doi.org/10.1007/s10844-014-0329-0

Download citation

Received: 29 January 2014
Revised: 26 June 2014
Accepted: 29 June 2014
Published: 22 July 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s10844-014-0329-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Classifying actions based on histogram of oriented velocity vectors

Abstract

Access this article

Similar content being viewed by others

Survey on SVM and their application in image classification

A review of computer vision-based approaches for physical rehabilitation and assessment

Toward human activity recognition: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classifying actions based on histogram of oriented velocity vectors

Abstract

Access this article

Similar content being viewed by others

Survey on SVM and their application in image classification

A review of computer vision-based approaches for physical rehabilitation and assessment

Toward human activity recognition: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation