Skip to main content
Log in

Action detection fusing multiple Kinects and a WIMU: an application to in-home assistive technology for the elderly

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

We present a vision-inertial system which combines two RGB-Depth devices together with a wearable inertial movement unit in order to detect activities of the daily living. From multi-view videos, we extract dense trajectories enriched with a histogram of normals description computed from the depth cue and bag them into multi-view codebooks. During the later classification step a multi-class support vector machine with a RBF-\(\mathcal {X}^2\) kernel combines the descriptions at kernel level. In order to perform action detection from the videos, a sliding window approach is utilized. On the other hand, we extract accelerations, rotation angles, and jerk features from the inertial data collected by the wearable placed on the user’s dominant wrist. During gesture spotting, a dynamic time warping is applied and the aligning costs to a set of pre-selected gesture sub-classes are thresholded to determine possible detections. The outputs of the two modules are combined in a late-fusion fashion. The system is validated in a real-case scenario with elderly from an elder home. Learning-based fusion results improve the ones from the single modalities, demonstrating the success of such multimodal approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. This is not stated in the published manuscript, but in an errata document. Check [46] and the errata document for more detail: http://jhmdb.is.tue.mpg.de/show_file?filename=Errata_JHMDB_ICCV_2013.pdf.

  2. The concatenation of the Viewpoint Feature Histogram (VFH) and Camera Roll Histogram (CRH).

  3. Models are a class-representative instance that will be used to compare test gestures in order to compute DTW matrices, see Sect. 4.2.4 for more details.

  4. Dense optical flow is computed using [32].

  5. These two quantities define the size of the dynamic time warping matrix, i.e., \(l_g \times l_M\)

References

  1. Adlam, T., Faulkner, R., Orpwood, R., Jones, K., Macijauskiene, J., Budraitiene, A.: The installation and support of internationally distributed equipment for people with dementia. IEEE Trans. Inf. Technol. Biomed. 8(3), 253–257 (2004)

    Article  Google Scholar 

  2. Akl, A., Valaee, S.: Accelerometer-based gesture recognition via dynamic-time warping, affinity propagation, & compressive sensing. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 2270–2273. IEEE (2010)

  3. Alexandre, L.A.: 3D descriptors for object and category recognition: a comparative evaluation. In: Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vilamoura, Portugal, vol. 1. Citeseer (2012)

  4. Amft, O., Junker, H., Troster, G.: Detection of eating and drinking arm gestures using inertial body-worn sensors. In: Proceedings of the 9th IEEE International Symposium on Wearable Computers, 2005, pp. 160–163 (2005)

  5. Avci, A., Bosch, S., Marin-Perianu, M., Marin-Perianu, R., Havinga, P.: Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: A survey. In: 2010 23rd International Conference on Architecture of computing systems (ARCS), pp. 1–10. VDE (2010)

  6. Bagalà, F., Becker, C., Cappello, A., Chiari, L., Aminian, K., Hausdorff, J.M., Zijlstra, W., Klenk, J.: Evaluation of accelerometer-based fall detection algorithms on real-world falls. PLoS One 7(5), e37,062 (2012)

    Article  Google Scholar 

  7. Bagheri, M., Gao, Q., Escalera, S., Clapes, A., Nasrollahi, K., Holte, M.B., Moeslund, T.B.: Keep it accurate and diverse: Enhancing action recognition performance by ensemble learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 22–29 (2015)

  8. Banerjee, T., Keller, J.M., Skubic, M., Stone, E.: Day or night activity recognition from video using fuzzy clustering techniques. IEEE Trans. Fuzzy Syst. 22(3), 483–493 (2014)

    Article  Google Scholar 

  9. Bao, L., Intille, S.S.: Activity recognition from user-annotated acceleration data. In: Pervasive Computing, pp. 1–17. Springer (2004)

  10. Barbosa, I.B., Cristani, M., Del Bue, A., Bazzani, L., Murino, V.: Re-identification with rgb-d sensors. In: Computer Vision–ECCV 2012. Workshops and Demonstrations, pp. 433–442. Springer (2012)

  11. Bautista, M.A., Hernández-Vela, A., Ponce, V., Perez-Sala, X., Baró, X., Pujol, O., Angulo, C., Escalera, S.: Probability-based dynamic time warping for gesture recognition on rgb-d data. In: Advances in depth image analysis and applications, pp. 126–135. Springer (2013)

  12. Ben Hadj Mohamed, A., Val, T., Andrieux, L., Kachouri, A.: Assisting people with disabilities through kinect sensors into a smart house. In: 2013 International Conference on Computer Medical Applications (ICCMA), pp. 1–5. IEEE (2013)

  13. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision, 2005. ICCV 2005. vol. 2, pp. 1395–1402. IEEE (2005)

  14. Bo, A., Hayashibe, M., Poignet, P.: Joint angle estimation in rehabilitation with inertial sensors and its integration with kinect. In: EMBC’11: 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 3479–3483. IEEE (2011)

  15. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)

    Article  Google Scholar 

  16. Booranrom, Y., Watanapa, B., Mongkolnam, P.: Smart bedroom for elderly using kinect. In: Computer Science and Engineering Conference (ICSEC), 2014 International, pp. 427–432. IEEE (2014)

  17. Botia, J.A., Villa, A., Palma, J.: Ambient assisted living system for in-home monitoring of healthy independent elders. Expert Syst. Appl. 39(9), 8136–8148 (2012)

    Article  Google Scholar 

  18. Bouchard, K., Bilodeau, J.S., Fortin-Simard, D., Gaboury, S., Bouchard, B., Bouzouane, A.: Human activity recognition in smart homes based on passive RFID localization. In: Proceedings of the 7th International Conference on Pervasive Technologies Related to Assistive Environments, p 1 (2014)

  19. Brendel, W., Todorovic, S.: Activities as time series of human postures. In: Computer Vision–ECCV 2010, pp. 721–734. Springer (2010)

  20. Bulling, A., Blanke, U.L.F., Schiele, B.: A tutorial on human activity recognition using body-worn inertial sensors. ACM Comput. Surv. 46(3), 1–33 (2014)

    Article  Google Scholar 

  21. Casale, P.: Approximate ensemble methods for physical activity recognition applications. ELCVIA 13(2), 22–23 (2014)

    Article  Google Scholar 

  22. Chang, S.F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A.C., Luo, J.: Large-scale multimodal semantic concept detection for consumer video. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, pp. 255–264. ACM (2007)

  23. Chattopadhyay, P., Roy, A., Sural, S., Mukhopadhyay, J.: Pose depth volume extraction from rgb-d streams for frontal gait recognition. J. Vis. Commun. Image Represent. 25(1), 53–63 (2014)

    Article  Google Scholar 

  24. Chen, C.C., Aggarwal, J.: Modeling human activities as speech. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3425–3432. IEEE (2011)

  25. Clapés, A., Reyes, M., Escalera, S.: Multi-modal user identification and object recognition surveillance system. Pattern Recognit. Lett. 34(7), 799–808 (2013)

    Article  Google Scholar 

  26. Crispim, C.F., Bathrinarayanan, V., Fosty, B., Konig, A., Romdhane, R., Thonnat, M., Bremond, F.: Evaluation of a monitoring system for event recognition of older people. In: 10th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 165–170. IEEE (2013)

  27. Daponte, P., De Vito, L., Sementa, C.: A wireless-based home rehabilitation system for monitoring 3D movements. MeMeA 2013—IEEE International Symposium on Medical Measurements and Applications, Proceedings, pp. 282–287 (2013). https://doi.org/10.1109/MeMeA.2013.6549753

  28. Delachaux, B., Rebetez, J., Perez-Uribe, A., Mejia, H.F.S.: Indoor activity recognition by combining one-vs.-all neural network classifiers exploiting wearable and depth sensors. In: Advances in Computational Intelligence, pp. 216–223. Springer (2013)

  29. Dell’Acqua, P., Klompstra, L.V., Jaarsma, T., Samini, A.: An assistive tool for monitoring physical activities in older adults. In: 2013 IEEE 2nd International Conference on Serious Games and Applications for Health (SeGAH), pp. 1–6. IEEE (2013)

  30. Dubois, A., Charpillet, F.: Human activities recognition with rgb-depth camera using hmm. In: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4666–4669. IEEE (2013)

  31. Escalera, S., Bar, X., Gonzlez, J., Bautista, M.A., Madadi, M., Reyes, M., Ponce, V., Escalante, H.J., Shotton, J., Guyon, I.: Chalearn looking at people challenge 2014: dataset and results. In: ECCV Workshops (2014)

  32. Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Image analysis, pp. 363–370. Springer (2003)

  33. Fenu, G., Steri, G.: IMU based post-traumatic rehabilitation assessment. In: 2010 3rd International Symposium on Applied Sciences in Biomedical and Communication Technologies, ISABEL 2010 pp. 1–4 (2010). https://doi.org/10.1109/ISABEL.2010.5702813

  34. Fernandez-Sanchez, E.J., Diaz, J., Ros, E.: Background subtraction based on color and depth using active sensors. Sensors 13(7), 8895–8915 (2013)

    Article  Google Scholar 

  35. Gaidon, A., Harchaoui, Z., Schmid, C.: Temporal localization of actions with actoms. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2782–2795 (2013)

    Article  Google Scholar 

  36. Georgi, M., Amma, C., Schultz, T.: Recognizing hand and finger gestures with IMU based motion and EMG based muscle activity sensing. In: Proceedings of the international conference on bio-inspired systems and signal processing, pp. 99–108 (2015). https://doi.org/10.5220/0005276900990108, http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0005276900990108

  37. Gkioxari, G., Malik, J.: Finding action tubes (2014). arXiv preprint arXiv:1411.6031

  38. Golby, C., Raja, V., Hundt, G.L., Badiyani, S.: A low cost ‘activities of daily living’ assessment system for the continual assessment of post-stroke patients, from inpatient/outpatient rehabilitation through to telerehabilitation. Successes and Failures in Telehealth. http://wrap.warwick.ac.uk/42341/ (2011)

  39. Gunes, H., Piccardi, M.: Affect recognition from face and body: early fusion vs. late fusion. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 4, pp. 3437–3443 (2005). https://doi.org/10.1109/ICSMC.2005.1571679

  40. Helten, T., Muller, M., Seidel, H.P., Theobalt, C.: Real-time body tracking with one depth camera and inertial sensors. In: 2013 IEEE International Conference on computer vision (ICCV), pp. 1105–1112. IEEE (2013)

  41. Hernández-Vela, A., Bautista, M.A., Perez-Sala, X., Ponce, V., Baró, X., Pujol, O., Angulo, C., Escalera, S.: Bovdw: bag-of-visual-and-depth-words for gesture recognition. In: 2012 21st international conference on pattern recognition (ICPR), pp. 449–452. IEEE (2012)

  42. Hondori, H.M., Khademi, M., Lopes, C.V.: Monitoring intake gestures using sensor fusion (microsoft kinect and inertial sensors) for smart home tele-rehab setting. In: 2012 1st Annual IEEE Healthcare Innovation Conference (2012)

  43. Hongeng, S., Nevatia, R., Bremond, F.: Video-based event recognition: activity representation and probabilistic recognition methods. Comput. Vis. Image Underst. 96(2), 129–162 (2004)

    Article  Google Scholar 

  44. Jafari, R., Li, W., Bajcsy, R., Glaser, S., Sastry, S.: Physical activity monitoring for assisted living at home. In: 4th International Workshop on Wearable and Implantable Body Sensor Networks (BSN 2007), pp. 213–219. Springer (2007)

  45. Jain, M., Van Gemert, J., Jégou, H., Bouthemy, P., Snoek, C.G.: Action localization with tubelets from motion. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 740–747. IEEE (2014)

  46. Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3192–3199. IEEE (2013)

  47. Junker, H., Amft, O., Lukowicz, P., Tröster, G.: Gesture spotting with body-worn inertial sensors to detect user activities. Pattern Recognit. 41(2008), 2010–2024 (2010). https://doi.org/10.1016/j.patcog.2007.11.016

    MATH  Google Scholar 

  48. Karantonis, D.M., Narayanan, M.R., Mathie, M., Lovell, N.H., Celler, B.G.: Implementation of a real-time human movement classifier using a triaxial accelerometer for ambulatory monitoring. IEEE Trans. Inf. Technol. Biomed. 10(1), 156–167 (2006)

    Article  Google Scholar 

  49. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp. 1725–1732. IEEE (2014)

  50. Ke, Y., Sukthankar, R., Hebert, M.: Volumetric features for video event detection. Int. J. Comput. Vis. 88(3), 339–362 (2010)

    Article  MathSciNet  Google Scholar 

  51. Kim, J., Yang, S., Gerla, M.: Stroketrack: wireless inertial motion tracking of human arms for stroke telerehabilitation. In: Proceedings of the First ACM Workshop on Mobile Systems, Applications, and Services for Healthcare, p. 4. ACM (2011)

  52. Kim, T.K., Cipolla, R.: Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1415–1428 (2009)

    Article  Google Scholar 

  53. Kong, W., Sessa, S., Cosentino, S., Zecca, M., Saito, K., Wang, C., Imtiaz, U., Lin, Z., Bartolomeo, L., Ishii, H., Ikai, T., Takanishi, A.: Development of a real-time IMU-based motion capture system for gait rehabilitation. In: Robotics and Biomimetics (ROBIO), IEEE International Conference on 2013, pp. 2100–2105 (2013)

  54. Kratz, S., Back, M.: Towards accurate automatic segmentation of IMU-tracked motion gestures. In: Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, ACM, pp. 1337–1342 (2015)

  55. Kwolek, B., Kepski, M.: Improving fall detection by the use of depth sensor and accelerometer. Neurocomputing 168, 637–645 (2015)

    Article  Google Scholar 

  56. Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  57. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8. IEEE (2008)

  58. Lara, D., Labrador, Ma.: A survey on human activity recognition using wearable sensors. IEEE Commun. Surv. Tutor. 15(3), 1192–1209 (2013)

    Article  Google Scholar 

  59. Lei, J., Ren, X., Fox, D.: Fine-grained kitchen activity recognition using rgb-d. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pp. 208–211. ACM (2012)

  60. Li, B.Y., Mian, A.S., Liu, W., Krishna, A.: Using kinect for face recognition under varying poses, expressions, illumination and disguise. In: 2013 IEEE Workshop on Applications of Computer Vision (WACV), pp. 186–192. IEEE (2013)

  61. Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–14. IEEE (2010)

  62. Liang, B., Zheng, L.: Spatio-temporal pyramid cuboid matching for action recognition using depth maps. In: International Conference on Image Processing 2015 (ICIP 2015) (2015)

  63. Liu, J., Zhong, L., Wickramasuriya, J., Vasudevan, V.: uwave: accelerometer-based personalized gesture recognition and its applications. Pervasive Mobile Comput. 5(6), 657–675 (2009)

    Article  Google Scholar 

  64. Liu, K., Chen, C., Jafari, R., Kehtarnavaz, N.: Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sens. J. 14(6), 1898–1903 (2014)

    Article  Google Scholar 

  65. Lombriser, C., Bharatula, N.B., Roggen, D., Tröster, G.: On-body activity recognition in a dynamic sensor network. In: Proceedings of the ICST 2nd International Conference on Body Area Networks, p. 17. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering) (2007)

  66. Luinge, H.J., Veltink, P.H.: Measuring orientation of human body segments using miniature gyroscopes and accelerometers. Med. Biol. Eng. Comput. 43(2), 273–282 (2005)

    Article  Google Scholar 

  67. Mace, D., Gao, W., Coskun, A.: Accelerometer-based hand gesture recognition using feature weighted Naïve Bayesian classifiers and dynamic time warping. In: Proceedings of the Companion Publication of the 2013 International Conference on Intelligent user interfaces companion, pp. 83–84. ACM (2013)

  68. Memon, M., Wagner, S.R., Pedersen, C.F., Beevi, F.H.A., Hansen, F.O.: Ambient assisted living healthcare frameworks, platforms, standards, and quality attributes. Sensors 14(3), 4312–4341 (2014)

    Article  Google Scholar 

  69. Mogelmose, A., Bahnsen, C., Moeslund, T.B., Clapés, A., Escalera, S.: Tri-modal person re-identification with rgb, depth and thermal features. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 301–307. IEEE (2013)

  70. Mubashir, M., Shao, L., Seed, L.: A survey on fall detection: principles and approaches. Neurocomputing 100, 144–152 (2013)

    Article  Google Scholar 

  71. Nait-Charif, H., McKenna, S.J.: Activity summarisation and fall detection in a supportive home environment. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 4, pp. 323–326. IEEE (2004)

  72. Natarajan, P., Wu, S., Vitaladevuni, S., Zhuang, X., Tsakalidis, S., Park, U., Prasad, R., Natarajan, P.: Multimodal feature fusion for robust event detection in web videos. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1298–1305 (2012). https://doi.org/10.1109/CVPR.2012.6247814

  73. Ni, B., Wang, G., Moulin, P.: Rgbd-hudaact: a color-depth video database for human daily activity recognition. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds.) Consumer Depth Cameras for Computer Vision, pp. 193–208. Springer, Berlin (2013)

    Chapter  Google Scholar 

  74. Nikisins, O., Nasrollahi, K., Greitans, M., Moeslund, T.B.: Rgb-dt based face recognition. In: 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 1716–1721. IEEE (2014)

  75. Oliver, N., Garg, A., Horvitz, E.: Layered representations for learning and inferring office activity from multiple sensory channels. Comput. Vis. Image Underst. 96(2), 163–180 (2004)

    Article  Google Scholar 

  76. Pardo, À., Clapés, A., Escalera, S., Pujol, O.: Actions in context: system for people with dementia. In: Nin, J., Villatoro, D. (eds.) Citizen in Sensor Networks, pp. 3–14. Springer, Berlin (2014)

    Chapter  Google Scholar 

  77. Piyathilaka, L., Kodagoda, S.: Gaussian mixture based hmm for human daily activity recognition using 3d skeleton features. In: 2013 8th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 567–572. IEEE (2013)

  78. Pylvänäinen, T.: Accelerometer based gesture recognition using continuous Hmms. In: Pattern Recognition and Image Analysis, pp. 639–646. Springer (2005)

  79. Rashidi, P., Cook, D.J.: Keeping the resident in the loop: adapting the smart home to the user. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 39(5), 949–959 (2009)

    Article  Google Scholar 

  80. Reyes, M., Domínguez, G., Escalera, S.: Featureweighting in dynamic timewarping for gesture recognition in depth data. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1182–1188. IEEE (2011)

  81. Ribeiro, P.C., Santos-Victor, J.: Human activity recognition from video: modeling, feature selection and classification architecture. In: Proceedings of International Workshop on Human Activity Recognition and Modelling, pp. 61–78. Citeseer (2005)

  82. Riboni, D., Bettini, C.: COSAR: hybrid reasoning for context-Aware activity recognition. Pers. Ubiquitous Comput. 15(3), 271–289 (2011). https://doi.org/10.1007/s00779-010-0331-7

    Article  Google Scholar 

  83. Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8. IEEE (2008)

  84. Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1234–1241. IEEE (2012)

  85. Saha, S., Pal, M., Konar, A., Janarthanan, R.: Neural network based gesture recognition for elderly health care using kinect sensor. In: Swarm, Evolutionary, and Memetic Computing, pp. 376–386. Springer (2013)

  86. Schindler, K., Van Gool, L.: Action snippets: How many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8. IEEE (2008)

  87. Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: . Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)

  88. Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56(1), 116–124 (2013)

    Article  Google Scholar 

  89. Snoek, C.G., Worring, M., Smeulders, A.W.: Early versus late fusion in semantic video analysis. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 399–402. ACM (2005)

  90. Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), pp. 842–849. IEEE (2012)

  91. Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1250–1257. IEEE (2012)

  92. Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of oriented normal vectors for object recognition with a depth sensor. In: Computer Vision–ACCV 2012, pp. 525–538. Springer (2012)

  93. Ullah, M.M., Parizi, S.N., Laptev, I.: Improving bag-of-features action recognition with non-local cues. In: BMVC, vol. 10, pp. 95–1. Citeseer (2010)

  94. Van Hoof, J., Kort, H., Rutten, P., Duijnstee, M.: Ageing-in-place with the use of ambient intelligence technology: perspectives of older users. Int. J. Med. Inform. 80(5), 310–331 (2011)

    Article  Google Scholar 

  95. Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 480–492 (2012)

    Article  Google Scholar 

  96. Vintsyuk, T.K.: Speech discrimination by dynamic programming. Cybern. Syst. Anal. 4(1), 52–57 (1968)

    Article  MathSciNet  Google Scholar 

  97. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)

  98. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3551–3558. IEEE (2013)

  99. Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Learning to track for spatio-temporal action localization. (2015). arXiv preprint arXiv:1506.01929

  100. World Health Organization, Alzheimer’s Disease International: Dementia: a public health priority. World Health Organization, Geneva (2012)

  101. Wu, D., Zhu, F., Shao, L.: One shot learning gesture recognition from rgbd images. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 7–12. IEEE (2012)

  102. Wu, J., Osuntogun, A., Choudhury, T., Philipose, M., Rehg, J.M.: A scalable approach to activity recognition based on object use. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE (2007)

  103. Xiao, Y., Zhao, G., Yuan, J., Thalmann, D.: Activity recognition in unconstrained rgb-d video using 3d trajectories. In: SIGGRAPH Asia 2014 Autonomous Virtual Humans and Social Robot for Telepresence, SA ’14, pp. 4:1–4:4. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2668956.2668961

  104. Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative cnn video representation for event detection (2014). arXiv preprint arXiv:1411.4006

  105. Yang, J.Y., Wang, J.S., Chen, Y.P.: Using acceleration measurements for activity recognition: an effective learning algorithm for constructing neural classifiers. Pattern Recognit. Lett. 29(16), 2213–2220 (2008)

    Article  Google Scholar 

  106. Zhang, B., Jiang, S., Wei, D., Marschollek, M., Zhang, W.: State of the art in gait analysis using wearable sensors for healthcare applications. In: 2012 IEEE/ACIS 11th International Conference on Computer and Information Science (ICIS), pp. 213–218. IEEE (2012)

  107. Zhang, C., Tian, Y.: Rgb-d camera-based daily living activity recognition. J. Comput. Vis. Image Process. 2(4), 12 (2012)

    Google Scholar 

  108. Zhao, Y., Liu, Z., Yang, L., Cheng, H.: Combing rgb and depth map features for human activity recognition. In: Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, pp. 1–4. IEEE (2012)

  109. Zhou, F., Jiao, J., Chen, S., Zhang, D.: A case-driven ambient intelligence system for elderly in-home assistance applications. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 41(2), 179–189 (2011)

    Article  Google Scholar 

  110. Zhu, C., Sheng, W.: Multi-sensor fusion for human daily activity recognition in robot-assisted living. In: Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction, pp. 303–304. ACM (2009)

  111. Zou, Q., Ni, L., Wang, Q., Li, Q., Wang, S.: Robust gait recognition by integrating inertial and rgbd sensors. IEEE Trans. Cybern. PP(99), 1–15 (2018). https://doi.org/10.1109/TCYB.2017.2682280

    Article  Google Scholar 

Download references

Acknowledgements

This work was partly supported by the spanish project TIN2016-74946-P and CERCA Programme / Generalitat de Catalunya. The work of Albert Clapés was supported by SUR-DEC of the Generalitat de Catalunya and FSE. We would also like to thank the SARQuavitae Claret elder home and all the people who volunteered for the recording of the dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Escalera.

Appendix: Wearable Module feature comparison

Appendix: Wearable Module feature comparison

In Sect. 4.2, several features are detailed. They are used to describe the gestures performed among different magnitudes. Here, an extensive comparison between different features’ combinations is presented, with the objective of demonstrating which is the most suitable for performing gesture recognition using the SAR-Quavitae Claret dataset.

Fig. 19
figure 19

In a, raw accelerometer \(+\) jerk. In b, sorted accelerometer \(+\) complementary filter

Fig. 20
figure 20

In a, sorted accelerometer \(+\) jerk. In b, complementary filter + jerk

Fig. 21
figure 21

In a, raw accelerometer \(+\) sorted accelerometer. In b, raw accelerometer + complementary accelerometer

Fig. 22
figure 22

In a, raw accelerometer \(+\) sorted accelerometer \(+\) complementary filter. In b, raw accelerometer \(+\) sorted accelerometer \(+\) jerk

Fig. 23
figure 23

In a, sorted accelerometer \(+\) complementary filter \(+\) jerk. In b, raw accelerometer \(+\) sorted accelerometer \(+\) complementary filter \(+\) jerk

Given the set of features described in the article, all their possible combinations have been generated. For each feature set, we have computed the distances between each of the pre-segmented gestures, using DTW as a metric.

Each of the matrices represent the average distance between segmented gestures, using leave-one-subject-out strategy. That is, for each subject, all his gestures are compared to the remaining ones. Finally, the average of all the distances is computed. The objective is to find the combination that is more discriminative. This means that, the best features will be those that minimize the distance between equal classes (diagonal of the matrices), while maximizing the distance against different classes.

In most of the combinations showed, one can see that “taking-pill” and “turn-page” gestures are easily confused, while “drink” and “spoonful” are, most of the times, distant from the other classes. Looking the matrices with more detail, Figs. 19b, 20a, b, and 23a lead us to the conclusion that Raw accelerometer is crucial for representing the data correctly. Regarding to those combinations only using a pair of features, Figures 21a, b, and 19a show us that these combinations are usually enough to discriminate correctly “drink” and “spoonful” classes, but are not enough for “taking-pill” and “turn-page.” The same happens in Figs. 22a, b, where “taking-pill” and “turn-page” are confused with “spoonful.” The combination that is able to discriminate correctly “turn-page,” “drink,” and “spoonful” is the one shown in Fig. 23b. However, “taking-pill” is still close to the other classes. This is the effect of the variability in the performance of the gesture.

Finally, thanks to this deep analysis, we can state that the most suitable combination of features for the presented dataset is the one including raw accelerometer, sorted accelerometer, complementary filter, and Jerk. However, the distance matrices presented in this section glimpses some issues related to the intra-class variability. Nevertheless, our hypothesis is that this is due to the reduced set of data available, that hinders the representativeness of the gestures.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Clapés, A., Pardo, À., Pujol Vila, O. et al. Action detection fusing multiple Kinects and a WIMU: an application to in-home assistive technology for the elderly. Machine Vision and Applications 29, 765–788 (2018). https://doi.org/10.1007/s00138-018-0931-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-018-0931-1

Keywords

Navigation