Skip to main content
Log in

A survey of depth and inertial sensor fusion for human action recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

A number of review or survey articles have previously appeared on human action recognition where either vision sensors or inertial sensors are used individually. Considering that each sensor modality has its own limitations, in a number of previously published papers, it has been shown that the fusion of vision and inertial sensor data improves the accuracy of recognition. This survey article provides an overview of the recent investigations where both vision and inertial sensors are used together and simultaneously to perform human action recognition more effectively. The thrust of this survey is on the utilization of depth cameras and inertial sensors as these two types of sensors are cost-effective, commercially available, and more significantly they both provide 3D human action data. An overview of the components necessary to achieve fusion of data from depth and inertial sensors is provided. In addition, a review of the publicly available datasets that include depth and inertial data which are simultaneously captured via depth and inertial sensors is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://mocap.cs.cmu.edu/

  2. http://tele-immersion.citris-uc.org/berkeley_mhad

  3. http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html

  4. http://www.utdallas.edu/~kehtar/UTD-MHAD.html

  5. http://cvip.computing.dundee.ac.uk/datasets/foodpreparation/50salads/

  6. https://project.eia-fr.ch/chairgest/Pages/Overview.aspx

  7. http://www.tlc.dii.univpm.it/blog/databases4kinect#IDFall

  8. http://mmv.eecs.qmul.ac.uk/mmgc2013/

References

  1. Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv (CSUR) 43(3):16

    Article  Google Scholar 

  2. Aggarwal JK, Xia L (2014) Human activity recognition from 3d data: a review. Pattern Recogn Lett 48:70–80

    Article  Google Scholar 

  3. Altun K, Barshan B (2010) Human activity recognition using inertial/magnetic sensor units. In: Human behavior understanding, pp 38–51

  4. Argyriou V, Petrou M, Barsky S (2010) Photometric stereo with an arbitrary number of illuminants. Comput Vis Image Underst 114(8):887–900

    Article  Google Scholar 

  5. Avci A, Bosch S, Marin-Perianu M, Marin-Perianu R, Havinga P (2010) Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: a survey. In: Architecture of Computing Systems (ARCS), 2010 23rd International Conference on, pp 1–10

  6. Bidmeshki MM, Jafari R (2013) Low power programmable architecture for periodic activity monitoring. In: Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems, pp 81–88

  7. Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267

    Article  Google Scholar 

  8. Bulling A, Blanke U, Schiele B (2014) A tutorial on human activity recognition using body-worn inertial sensors. ACM Comput Surv (CSUR) 46(3):33

    Article  Google Scholar 

  9. Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167

    Article  Google Scholar 

  10. Cao C, Zhang Y, Lu H (2015) Multi-modal learning for gesture recognition. In: Multimedia and Expo (ICME), 2015 I.E. International Conference on, pp 1–6

  11. Chen L, Hoey J, Nugent CD, Cook DJ, Yu Z (2012) Sensor-based activity recognition. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):790–808

    Article  Google Scholar 

  12. Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Proceedings of the IEEE International Conference on Image Processing. Canada

  13. Chen C, Jafari R, Kehtarnavaz N (2015) Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans Human-Machine Syst 45(1):51–61

    Article  Google Scholar 

  14. Chen C, Jafari R, Kehtarnavaz N (2015) A real-time human action recognition system using depth and inertial sensor fusion. IEEE Sensors J 2015

  15. Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion maps-based local binary patterns. In: Applications of Computer Vision (WACV), 2015 I.E. Winter Conference on, pp 1092–1099

  16. Chen C, Kehtarnavaz N, Jafari R (2014) A medication adherence monitoring system for pill bottles based on a wearable inertial sensor. In: Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE, pp 4983–4986

  17. Chen C, Liu K, Jafari R, Kehtarnavaz N (2014) Home-based senior fitness test measurement system using collaborative inertial and depth sensors. In: Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE, pp 4135–4138

  18. Chen C, Liu K, Kehtarnavaz N (2013) Real-time human action recognition based on depth motion maps. J Real-Time Image Proc 1–9

  19. Chen L, Wei H, Ferryman J (2013) A survey of human motion analysis using depth imagery. Pattern Recogn Lett 34(15):1995–2006

    Article  Google Scholar 

  20. Cippitelli E, Gasparrini S, Gambi E, Spinsante S, Wahsleny J, Orhany I, Lindhy T (2015) Time synchronization and data fusion for RGB-depth cameras and inertial sensors in AAL applications. In: Communication Workshop (ICCW), 2015 I.E. International Conference on, pp 265–270

  21. Delachaux B, Rebetez J, Perez-Uribe A, Mejia HFS (2013) Indoor activity recognition by combining one-vs.-all neural network classifiers exploiting wearable and depth sensors. In: Advances in Computational Intelligence, pp 216–223

  22. Destelle F, Ahmadi A, O’Connor NE, Moran K, Chatzitofis A, Zarpalas D, Daras P (2014) Low-cost accurate skeleton tracking based on fusion of kinect and wearable inertial sensors. In: Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European, pp 371–375

  23. Eddy SR (2004) What is a hidden Markov model? Nat Biotechnol 22(10):1315–1316

    Article  Google Scholar 

  24. Ermes M, Parkka J, Mantyjarvi J, Korhonen I (2008) Detection of daily activities and sports with wearable sensors in controlled and uncontrolled conditions. IEEE Trans Inf Technol Biomed 12(1):20–26

    Article  Google Scholar 

  25. Evangelidis G, Singh G, Horaud R (2014) Skeletal quads: human action recognition using joint quadruples. In: Pattern Recognition (ICPR), 2014 22nd International Conference on, pp 4513–4518

  26. Gasparrini S, Cippitelli E, Gambi E, Spinsante S, Wåhslén J, Orhan I, Lindh T (2016) Proposal and experimental evaluation of fall detection solution based on wearable and depth data fusion. In: ICT Innovations 2015, pp 99–108

  27. Gasparrini S, Cippitelli E, Spinsante S, Gambi E (2014) A depth-based fall detection system using a Kinect® sensor. Sensors 14(2):2756–2775

    Article  Google Scholar 

  28. Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: Computer Vision, 2009 I.E. 12th International Conference on, pp 221–228

  29. Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416

    Article  MathSciNet  Google Scholar 

  30. Guan D, Ma T, Yuan W, Lee YK, Jehad Sarkar AM (2011) Review of sensor-based activity recognition systems. IETE Tech Rev 28(5):418–433

    Article  Google Scholar 

  31. Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybernet 43(5):1318–1334

    Article  Google Scholar 

  32. Helten T, Muller M, Seidel HP, Theobalt C (2013) Real-time body tracking with one depth camera and inertial sensors. In: Computer Vision (ICCV), 2013 I.E. International Conference on, pp 1105–1112

  33. http://www.microsoft.com/en-us/kinectforwindows/

  34. Jovanov E, Milenkovic A, Otto C, De Groen PC (2005) A wireless body area network of intelligent motion sensors for computer assisted physical rehabilitation. J NeuroEng Rehabil 2(1):6

    Article  Google Scholar 

  35. Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British Machine Vision Conference, pp 275–1. British Machine Vision Association

  36. Kwolek B, Kepski M (2014) Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput Methods Prog Biomed 117(3):489–501

    Article  Google Scholar 

  37. Kwolek B, Kepski M (2015) Improving fall detection by the use of depth sensor and accelerometer. Neurocomputing 168:637–645

    Article  Google Scholar 

  38. Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123

    Article  MathSciNet  Google Scholar 

  39. Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Computer Vision and Pattern Recognition, 2008. IEEE Conference on, pp 1–8

  40. Lara OD, Labrador MA (2013) A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tutorials 15(3):1192–1209

    Article  Google Scholar 

  41. Li Q, Stankovic J, Hanson M, Barth AT, Lach J, Zhou G (2009) Accurate, fast fall detection using gyroscopes and accelerometer-derived posture information. In: Wearable and Implantable Body Sensor Networks, 2009. BSN 2009. Sixth International Workshop on, pp 138–143

  42. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 I.E. Computer Society Conference on, pp 9–14

  43. Liu K, Chen C, Jafari R, Kehtarnavaz N (2014) Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sensors J 14(6):1898–1903

    Article  Google Scholar 

  44. Liu K, Chen C, Jafari R, Kehtarnavaz N (2014) Multi-HMM classification for hand gesture recognition using two differing modality sensors. In: Circuits and Systems Conference (DCAS), 2014 I.E. Dallas, pp 1–4

  45. Mukherjee S, Biswas SK, Mukherjee DP (2011) Recognizing human action at a distance in video by key poses. IEEE Trans Circuits Syst Video Technol 21(9):1228–1241

    Article  Google Scholar 

  46. Ni B, Wang G, Moulin P (2013) Rgbd-hudaact: a color-depth video database for human daily activity recognition. In: Consumer Depth Cameras for Computer Vision, pp 193–208

  47. Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley mhad: a comprehensive multimodal human action database. In: Applications of Computer Vision (WACV), 2013 I.E. Workshop on, pp 53–60

  48. Oreifej O, Liu Z (2013) Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences. In: Computer Vision and Pattern Recognition (CVPR), 2013 I.E. Conference on, pp 716–723

  49. Pavlovic V, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19(7):677–695

    Article  Google Scholar 

  50. Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990

    Article  Google Scholar 

  51. Ramanathan M, Yau WY, Teoh EK (2014) Human action recognition with video data: research and evaluation challenges. IEEE Trans Human-Machine Syst 44(5):650–663

    Article  Google Scholar 

  52. Ruffieux S, Lalanne D, Mugellini E (2013) ChAirGest: a challenge for multimodal mid-air gesture recognition for close HCI. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp 483–488

  53. Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 3, pp 32–36

  54. Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, Princeton

  55. Shan J, Akella S (2014) 3D human action segmentation and recognition using pose kinetic energy. In: Advanced Robotics and its Social Impacts (ARSO), 2014 I.E. Workshop on, pp 69–75

  56. Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124

    Article  Google Scholar 

  57. Spriggs EH, De La Torre F, Hebert M (2009) Temporal segmentation and activity classification from first-person sensing. In: Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on, pp 17–24

  58. Stein S, McKenna SJ (2013) Combining embedded accelerometers with computer vision for recognizing food preparation activities. In: Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp 729–738

  59. Sun L, Aizawa K (2013) Action recognition using invariant features under unexampled viewing conditions. In: Proceedings of the 21st ACM International Conference on Multimedia, pp 389–392

  60. Theodoridis T, Agapitos A, Hu H, Lucas SM (2008) Ubiquitous robotics in physical human action recognition: a comparison between dynamic anns and gp. In: Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on, pp 3064–3069

  61. Tian Y, Meng X, Tao D, Liu D, Feng C (2015) Upper limb motion tracking with the integration of IMU and Kinect. Neurocomputing 159:207–218

    Article  Google Scholar 

  62. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Computer Vision and Pattern Recognition (CVPR), 2014 I.E. Conference on, pp 588–595

  63. Vieira AW, Nascimento ER, Oliveira GL, Liu Z, Campos MF (2012) Stop: space-time occupancy patterns for 3d action recognition from depth map sequences. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp 252–259

  64. Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3d Action Recognition with Random Occupancy Patterns. In: Computer Vision–ECCV 2012, pp 872–885

  65. Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: Computer Vision and Pattern Recognition (CVPR), 2012 I.E. Conference on, pp 1290–1297

  66. Weinland D, Ronfard R, Boyer E (2011) A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst 115(2):224–241

    Article  Google Scholar 

  67. Wong C, McKeague S, Correa J, Liu J, Yang G Z (2012) Enhanced classification of abnormal gait using BSN and depth. In: Wearable and Implantable Body Sensor Networks (BSN), 2012 Ninth International Conference on, pp 166–171

  68. Wu J, Cheng J (2014) Bayesian co-boosting for multi-modal gesture recognition. J Mach Learn Res 15(1):3013–3036

    MATH  Google Scholar 

  69. Xia L, Aggarwal JK (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: Computer Vision and Pattern Recognition (CVPR), 2013 I.E. Conference on, pp 2834–2841

  70. Xie S, Wang Y (2014) Construction of tree network with limited delivery latency in homogeneous wireless sensor networks. Wirel Pers Commun 78(1):231–246

    Article  Google Scholar 

  71. Yang AY, Iyengar S, Sastry S, Bajcsy R, Kuryloski P, Jafari R (2008) Distributed segmentation and classification of human actions using a wearable motion sensor network. In: Computer Vision and Pattern Recognition Workshops, 2008. CVPRW’08. IEEE Computer Society Conference on, pp 1–8

  72. Yang AY, Jafari R, Sastry SS, Bajcsy R (2009) Distributed recognition of human actions using wearable motion sensor networks. J Ambient Intell Smart Environ 1(2):103–115

    Google Scholar 

  73. Yang X, Tian Y (2012) Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 I.E. Computer Society Conference on, pp 14–19

  74. Yang X, Tian Y (2014) Super normal vector for activity recognition using depth sequences. In: Computer Vision and Pattern Recognition (CVPR), 2014 I.E. Conference on, pp 804–811

  75. Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia, pp 1057–1060

  76. Ye M, Zhang Q, Wang L, Zhu J, Yang R, Gall J (2013) A survey on human motion analysis from depth data. In: Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications. Springer Berlin Heidelberg, pp 149–187

  77. Yin Y, Davis R (2013) Gesture spotting and recognition using salience detection and concatenated hidden markov models. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp 489–494

  78. Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: which helps face recognition?. In: Computer Vision (ICCV), 2011 I.E. International Conference on, pp 471–478

Download references

Acknowledgments

This work was supported in part by the National Science Foundation, under grant CNS-1150079. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Jafari, R. & Kehtarnavaz, N. A survey of depth and inertial sensor fusion for human action recognition. Multimed Tools Appl 76, 4405–4425 (2017). https://doi.org/10.1007/s11042-015-3177-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3177-1

Keywords

Navigation