Human interaction recognition using spatial-temporal salient feature

Hu, Tao; Zhu, Xinyan; Wang, Shaohua; Duan, Lian

doi:10.1007/s11042-018-6074-6

Human interaction recognition using spatial-temporal salient feature

Published: 14 July 2018

Volume 78, pages 28715–28735, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Tao Hu^1,2,3,4,
Xinyan Zhu^1,2,
Shaohua Wang⁵ &
…
Lian Duan^4,6

454 Accesses
11 Citations
Explore all metrics

Abstract

Depth sensor is widely used today and has great impact in object pose estimation, camera tracking, human actions, and scene reconstruction. This paper presents a novel method for human interaction recognition based on 3D skeleton data captured by Kinect sensor using hierarchical spatial-temporal saliency-based representation method. Hierarchical saliency can be conceptualized as Salient Actions at the highest level, determined by the initial movement in an interaction; Salient Points at middle level, determined by a single time point uniquely identified for all instances of Salient Action; Salient Joints at the lowest level, determined by the greatest positional changes of human joints in a Salient Action sequence. Given the interaction saliency at different levels, several types of features, such as spatial displacement, direction relations, and etc., are introduced based on action characteristics. Since there are few publicly accessible test datasets, we created a new dataset with eight types of interactions named K3HI, using the Microsoft Kinect. The method was tested based on Support Vector Machine (SVM) multi-class classifier. In the experiment, the results demonstrate that the average recognition accuracy of hierarchical saliency-based representation is 90.29%, outperforming methods using other features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

Computer vision-based hand gesture recognition for human-robot interaction: a review

Article Open access 19 July 2023

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

References

Aggarwal JK, Park S (2004) Semantic-level understanding of human actions and interactions using event hierarchy. IEEE Workshop on Articulated and Nonrigid Motion, Washington, DC, p. 12
Brand M (1997) Coupled hidden Markov models for modeling interacting processes. MIT Media Lab Perceptual Computing / Learning and Common Sense Technical Report
Chen L, Wei H, Ferryman J (2013) A survey of human motion analysis using depth imagery. Pattern Recogn Lett 34(15):1995–2006. https://doi.org/10.1016/j.patrec.2013.02.006
Article Google Scholar
Crawford GP, Fiske TG, Silverstein LD (1997) Reflective color LCDs based on H-PDLC and PSCT technologies. J Soc Inf Disp 5(1):45–48. https://doi.org/10.1889/1.1985123
Article Google Scholar
Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. 2nd Joint IEEE International Workshop on, 65-72
Du Y, Chen F, Xu W (2007) Human interaction representation and recognition through motion decomposition. IEEE Signal Processing Letters 14(12):952–955
Article Google Scholar
Edwards M, Deng J, Xie X (2016) From pose to activity: surveying datasets and introducing CONVERSE. Comput Vis Image Underst 144:73–105
Article Google Scholar
Firman M (2016) RGBD datasets: past, present and future. Paper presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
Fryer PM, Colgan E, Galligan E, Graham W, Horton R, Hunt D, Jenkins L, John R, Koke P, Kuo Y, Latzko K, Libsch F, Lien A, Nywening R, Polastre R, Rothwell ME, Wilson J, Wisnieff R, Wright S (1997) A six-mask TFT-LCD process using copper-gate metallurgy. J Soc Inf Disp 5(1):49–52. https://doi.org/10.1889/1.1985124
Article Google Scholar
Guo P, Miao Z, Zhang X-P, Shen Y, Wang S (2012) Coupled observation decomposed hidden Markov model for multiperson activity recognition. Circuits and Systems for Video Technology, IEEE Transactions on 22(9):1306–1320
Article Google Scholar
Hu T, Zhu X, Guo W, Su K (2013) Efficient interaction recognition through positive action representation. Math Probl Eng 2013:1–11
Google Scholar
Hu T, Zhu X, Guo W, Wang S, Zhu J (2018) Human action recognition based on scene semantics. Multimedia Tools and Applications, 1–22
Kakizaki T, Tanamachi S, Hayashi M (1997) Development of 25-in. Active-matrix LCD using plasma addressing for video-rate high-quality displays. J Soc Inf Disp 5(1):57–60. https://doi.org/10.1889/1.1985126
Article Google Scholar
Kong Y, Fu Y (2016) Close human interaction recognition using patch-aware models. IEEE Trans Image Process 25(1):167–178
Article MathSciNet MATH Google Scholar
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pp 9–14
Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searching. Paper presented at the Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on
Mastorakis G, Makris D (2012) Fall detection system using Kinect’s infrared sensor. J Real-Time Image Proc:1–12. https://doi.org/10.1007/s11554-012-0246-9
Megavannan V, Agarwal B, Babu RV (2012) Human action recognition using depth maps. In: Signal Processing and Communications (SPCOM), 2012 International Conference on, pp 1–5
Ng AY, Jordan MI (2001) On Discriminative vs. Generative classifiers: a comparison of logistic regression and naive Bayes. Paper presented at the In NIPS
Ni B, Wang G, Moulin P (2011) RGBD-HuDaAct: a color-depth video database for human daily activity recognition. In: Consumer depth cameras for computer vision. Springer, London, pp 193–208
Nowozin S, Shotton J (2012) Action points: a representation for low-latency online human action recognition. Microsoft Research TechReport
Oikonomopoulos A, Patras I, Pantic M (2005) Spatiotemporal salient points for visual recognition of human actions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 36(3):710–719
Article Google Scholar
Park S, Aggarwal J (2004) A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Systems 10(2):164–179
Article Google Scholar
Rapantzikos K, Avrithis Y, Kollias S (2009) Dense saliency-based spatiotemporal feature points for action recognition. Paper presented at the IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, Miami
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011a) Real-time human pose recognition in parts from single depth images. In CVPR
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011b) Real-Time Human Pose Recognition in Parts from a Single Depth Image. Paper presented at the IEEE computer vision and pattern recognition (CVPR) 2011, Colorado
Sung J, Ponce C, Selman B, Saxena A (2011) Human Activity Detection from RGBD Images. CoRR, abs/1107.0169
Vig E, Dorr M, Cox DD (2012) Saliency-based selection of sparse descriptors for action recognition. Paper presented at the Proceedings of the International Conference on Image Processing
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. Paper presented at the Computer Vision and Pattern Recognition (CVPR), 2012 I.E. Conference on, Providence
Xia L, Chen C-C, Aggarwal JK (2011) Human detection using depth information by Kinect. Paper presented at the Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 I.E. Computer Society Conference on Colorado Springs, CO
Yao A, Fanelli JGG, Van Gool L (2011) Does human action recognition benefit from pose estimation? In: Proceedings of the 22nd British Machine Vision Conference-BMVC 2011
Yun K, Honorio J, Berg TL, Samaras D (2012) Two-person interaction detection using body-pose features and multiple instance learning. Paper presented at the 2012 I.E. Computer Society Conference on Computer Vision And Pattern Recognition Workshops CVPRW
Zhang X, Wandell BA (1997) A spatial extension of CIELAB for digital color-image reproduction. J Soc Inf Disp 5(1):61–63. https://doi.org/10.1889/1.1985127
Article Google Scholar

Download references

Acknowledgments

We are grateful to the volunteers for capturing data. This research is supported by the National Key R&D Program of China (No. 2016YFB0502204), the National Key Technology R&D Program (No. 2015BAK03B04), the Funds for the Central Universities (No. 413000010), the Open Found of State Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University (No.16(03)), Guangxi Higher Education Undergraduate Teaching Reform Project Category A (2016JGA258), and the Opening Foundation of Key Laboratory of Environment Change and Resources Use in Beibu Gulf Ministry of Education (Guangxi Teachers Education University) and Guangxi Key Laboratory of Earth Surface Processes and Intelligent Simulation (Guangxi Teachers Education University) (No.GTEU-KLOP-K1704).

Author information

Authors and Affiliations

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, Hubei, 430079, China
Tao Hu & Xinyan Zhu
Collaborative Innovation Center of Geospatial Technology, Wuhan University, Wuhan, Hubei, 430079, China
Tao Hu & Xinyan Zhu
School of Information, Kent State University, Kent, OH, 44240, USA
Tao Hu
Key Laboratory of Environment Change and Resources Use in Beibu Gulf, Guangxi Teachers Education University, Nanning, Guangxi, 530001, China
Tao Hu & Lian Duan
International Software School, Wuhan University, Wuhan, Hubei, 430079, China
Shaohua Wang
Geography Science and Planning School, Guangxi Teachers Education University, Nanning, Guangxi, 530001, China
Lian Duan

Authors

Tao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xinyan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shaohua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lian Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaohua Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, T., Zhu, X., Wang, S. et al. Human interaction recognition using spatial-temporal salient feature. Multimed Tools Appl 78, 28715–28735 (2019). https://doi.org/10.1007/s11042-018-6074-6

Download citation

Received: 07 December 2017
Revised: 03 March 2018
Accepted: 29 April 2018
Published: 14 July 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11042-018-6074-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human interaction recognition using spatial-temporal salient feature

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Computer vision-based hand gesture recognition for human-robot interaction: a review

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human interaction recognition using spatial-temporal salient feature

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Computer vision-based hand gesture recognition for human-robot interaction: a review

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation