Skip to main content
Log in

System for multimodal data acquisition for human action recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Multimodal data is being used more widely for human action recognition nowadays due to the progress of machine learning methods and the development of new types of sensors. The acquisition of the data required by such solutions is often troublesome, and it is difficult to find the proper tools for this process. In this paper, we present a new toolkit for multimodal acquisition. We address and discuss issues concerning the synchronization of data from multiple sensors, the optimization of the initial processing of raw data, and the design of the user interface for efficiently recording large databases. The system was verified in a setup consisting of three types of sensors – a Kinect 2, two PS3Eye cameras, and an accelerometer glove. The accuracy of the synchronization and performance of the initial processing proved to be suitable for human action acquisition and recognition. The system was used for the acquisition of an extensive database of sign language gestures. User feedback indicated the recording process to be efficient, which is also evaluated in the paper. The system is publicly available, both in the form of a standalone application as well as source code, and can be easily customized to any type of sensor setup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The multimodal data acquisition toolkit is available at https://github.com/fmal-pl/MultiSourceAcquisition.

  2. CL-Eye Platform SDK Homepage: https://codelaboratories.com/products/eye/sdk/

References

  1. Antonakaki P, Kosmopoulos D, Perantonis SJ (2009) Detecting abnormal human behaviour using multiple cameras. Signal Process 89:1723–1738. https://doi.org/10.1016/j.sigpro.2009.03.016

    Article  MATH  Google Scholar 

  2. Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. Proc. 6th Int. Conf. Multimodal interfaces - ICMI ‘04 205. https://doi.org/10.1145/1027933.1027968

  3. Chang KI, Bowyer KW, Flynn PJ (2003) Multimodal 2D and 3D biometrics for face recognition. 2003 I.E. Int. SOI Conf. Proc. (cat. No.03CH37443). https://doi.org/10.1109/AMFG.2003.1240842

  4. Chen L, Hoey J, Nugent CD, Cook DJ, Yu Z (2012) Sensor-based activity recognition. IEEE Trans Syst Man Cybern Part C Appl Rev 42:790–808. https://doi.org/10.1109/TSMCC.2012.2198883

    Article  Google Scholar 

  5. Cheung YM, Peng Q (2015) Eye gaze tracking with a web camera in a desktop environment. IEEE Trans Human-Mach Syst 45:419–430. https://doi.org/10.1109/THMS.2015.2400442

    Article  Google Scholar 

  6. Cholewa M, Głomb P (2013) Estimation of the number of states for gesture recognition with hidden Markov models based on the number of critical points in time sequence. Pattern Recogn Lett 34:574–579. https://doi.org/10.1016/j.patrec.2012.12.002

    Article  Google Scholar 

  7. Cholewa M, Głomb P (2015) Natural human gestures classification using multisensor data. 2015 3rd IAPR Asian Conf. Pattern Recogn 499–503

  8. Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43:996–1002. https://doi.org/10.1109/TSMCA.2012.2223670

    Article  Google Scholar 

  9. Dinh DL, Lee S, Kim TS (2016) Hand number gesture recognition using recognized hand parts in depth images. Multimed Tools Appl 75:1333–1348. https://doi.org/10.1007/s11042-014-2370-y

    Article  Google Scholar 

  10. Ganapathi V, Plagemann C, Koller D, Thrun S (2010) Real time motion capture using a single time-of-flight camera, 2010 I.E. Comput. Soc. Conf. Comput. Vis. Pattern Recogn 755–762. https://doi.org/10.1109/CVPR.2010.5540141.

  11. García J, Gardel A, Bravo I, Lázaro JL, Martínez M (2013) Tracking people motion based on extended condensation algorithm. IEEE Trans Syst Man Cybern Syst Hum 43:606–618. https://doi.org/10.1109/TSMCA.2012.2220540

    Article  Google Scholar 

  12. Gkalelis N, Kim H, Hilton A, Nikolaidis N, Pitas I (2009) The i3DPost multi-view and 3D human action/interaction database. CVMP 2009 - 6th Eur. Conf. Vis. Media prod 159–168. https://doi.org/10.1109/CVMP.2009.19

  13. Hg RI, Jasek P, Rofidal C, Nasrollahi K, Moeslund TB, Tranchet G (2012) An RGB-D database using Microsoft’s kinect for windows for face detection. 2012 eighth Int. Conf. Signal image Technol. Internet based Syst 42–46. https://doi.org/10.1109/SITIS.2012.17

  14. Hoda M, Hoda Y, Hafidh B, El Saddik A (2017) Predicting muscle forces measurements from kinematics data using kinect in stroke rehabilitation. Multimed tools Appl 1–19. doi:https://doi.org/10.1007/s11042-016-4274-5

  15. Holte MB, Tran C, Trivedi MM, Moeslund TB (2012) Human pose estimation and activity recognition from multi-view videos: comparative explorations of recent developments. IEEE J Sel Top Signal Process 6:538–552. https://doi.org/10.1109/JSTSP.2012.2196975

    Article  Google Scholar 

  16. Hou YL, Pang GKH (2011) People counting and human detection in a challenging situation. IEEE Trans Syst Man Cybern Syst Hum 41:24–33. https://doi.org/10.1109/TSMCA.2010.2064299

    Article  Google Scholar 

  17. Hwang BW, Kim S, Lee SW (2006) A full-body gesture database for automatic gesture recognition. FGR 2006 proc. 7th Int. Conf. Autom. Face Gesture Recognit 2006: 243–248. https://doi.org/10.1109/FGR.2006.8

  18. Jaimes A, Sebe N (2007) Multimodal human-computer interaction: a survey. Comput Vis Image Underst 108:116–134. https://doi.org/10.1016/j.cviu.2006.10.019.

    Article  Google Scholar 

  19. Kepski M, Kwolek B, Austvoll I (2012) Fuzzy inference-based reliable fall detection using kinect and accelerometer. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes bioinformatics). 7267 LNAI 266–273. https://doi.org/10.1007/978-3-642-29347-4_31

  20. Krumm J, Harris S, Meyers B, Brumitt B, Hale M, Shafer S (2000) Multi-camera multi-person tracking for easy living. Proc Third IEEE Int Work Vis Surveil. https://doi.org/10.1109/VS.2000.856852.

  21. Kumar P, Gauba H, Pratim Roy P, Prosad Dogra D (2017) A multimodal framework for sensor based sign language recognition. Neurocomputing 259:21–38. https://doi.org/10.1016/j.neucom.2016.08.132

    Article  Google Scholar 

  22. Kwolek B, Kepski M (2015) Improving fall detection by the use of depth sensor and accelerometer. Neurocomputing 168:637–645. https://doi.org/10.1016/j.neucom.2015.05.061

    Article  Google Scholar 

  23. Lazzeri N, Mazzei D, De Rossi D (2014) Development and testing of a multimodal acquisition platform for human-robot interaction affective studies. J Human-Robot Interact 3:1. https://doi.org/10.5898/JHRI.3.2.Lazzeri

    Article  Google Scholar 

  24. Li L, Dai S (2016) Action recognition with spatio-temporal augmented descriptor and fusion method. Multimed Tools Appl. https://doi.org/10.1007/s11042-016-3789-0

  25. Liu Y, Zhang X, Cui J, Wu C, Aghajan H, Zha H (2010) Visual analysis of child-adult interactive behaviors in video sequence. 2010 16th Int. Conf. Virtual Syst. Multimedia, VSMM 2010. 26–33. https://doi.org/10.1109/VSMM.2010.5665969

  26. Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115. https://doi.org/10.1016/j.neucom.2015.08.096

    Article  Google Scholar 

  27. Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. Proc. 30th Conf. Artif. Intell. (AAAI 2016) 1266–1272

  28. Malawski F (2014) Applying hand gesture recognition with time-of-flight camera for 3D medical data analysis. Challenges Mod Technol 5:12–16

    Google Scholar 

  29. Malawski F, Kwolek B (2016) Classification of basic footwork in fencing using accelerometer. Signal Process Algorithms Archit Arrange Appl (SPA), IEEE: 51–55. https://doi.org/10.1109/SPA.2016.7763586

  30. Malawski F, Kwolek B, Sako S (2014) Using Kinect for facial expression recognition under varying poses and illumination. Act. Media Technol. 10th Int. Conf. AMT 2014 - Lect. Notes Comput. Sci. 8610 LNCS 395–406. https://doi.org/10.1007/978-3-319-09912-5_33

  31. Mendels O, Stern H, Berman S (2014) User identification for home entertainment based on free-air hand motion signatures. IEEE Trans Syst Man Cybern Syst Hum 44:1461–1473. https://doi.org/10.1109/TSMC.2014.2329652

    Article  Google Scholar 

  32. Mian AS, Bennamoun M, Owens R (2007) An efficient multimodal 2D-3D hybrid approach to automatic face recognition. IEEE Trans Pattern Anal Mach Intell 29:1927–1943. https://doi.org/10.1109/TPAMI.2007.1105

    Article  Google Scholar 

  33. Michel M, Stanford V (2006) Synchronizing multimodal data streams acquired using commodity hardware. Proc. 4th ACM Int. work. Video Surveill. Sens. Networks - VSSN ‘06 3. https://doi.org/10.1145/1178782.1178785

  34. Min R, Kose N, Dugelay J-L (2013) KinectFaceDB: a Kinect database for face recognition. IEEE Trans Syst Man Cybern Syst 44:1534–1548

    Article  Google Scholar 

  35. Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley MHAD: a comprehensive multimodal human action database. Proc IEEE Work Appl Comput Vis 53–60. https://doi.org/10.1109/WACV.2013.6474999

  36. Oliver N, Garg A, Horvitz E (2004) Layered representations for learning and inferring office activity from multiple sensory channels. Comput Vis Image Underst 96:163–180. https://doi.org/10.1016/j.cviu.2004.02.004

    Article  Google Scholar 

  37. Pantic M, Rothkrantz LJM (2003) Toward an affect-sensitive multimodal human-computer interaction. Proc IEEE 91:1370–1390. https://doi.org/10.1109/JPROC.2003.817122

    Article  Google Scholar 

  38. Plantard P, Hubert HP, Multon F (2017) Filtered pose graph for efficient kinect pose reconstruction. Multimed Tools Appl 76:4291–4312. https://doi.org/10.1007/s11042-016-3546-4

    Article  Google Scholar 

  39. Premaratne P, Ajaz S, Premaratne M (2013) Hand gesture tracking and recognition system using Lucas–Kanade algorithms for control of consumer electronics. Neurocomputing 116:242–249. https://doi.org/10.1016/j.neucom.2011.11.039

    Article  Google Scholar 

  40. Sako S, Hatano M, Kitamura T (2016) Real-time Japanese sign language recognition based on three phonological elements of sign. In: Int. Conf. Human-computer interact, pp 130–136

  41. Sha T, Song M, Bu J, Chen C, Tao D (2011) Feature level analysis for 3D facial expression recognition. Neurocomputing 74:2135–2141. https://doi.org/10.1016/j.neucom.2011.01.008

    Article  Google Scholar 

  42. Song W, Cai X, Xi Y, Cho S, Cho K (2015) Real-time single camera natural user interface engine development. Multimed Tools Appl:11159–11175. https://doi.org/10.1007/s11042-015-2986-6

  43. Tenorth M, Bandouch J, Beetz M (2009) The TUM kitchen data set of everyday manipulation activities for motion tracking and action recognition. 2009 I.E. 12th Int. Conf Comput Vis Work ICCV Work 1089–1096. https://doi.org/10.1109/ICCVW.2009.5457583

  44. Uddin MZ, Hassan MM (2015) A depth video-based facial expression recognition system using radon transform, generalized discriminant analysis, and hidden Markov model. Multimed Tools Appl 74:3675–3690. https://doi.org/10.1007/s11042-013-1793-1

    Article  Google Scholar 

  45. Vadakkepat P, Lim P, De Silva LC, Jing L, Ling LL (2008) Multimodal approach to human-face detection and tracking. IEEE Trans Ind Electron 55:1385–1393. https://doi.org/10.1109/TIE.2007.903993

    Article  Google Scholar 

  46. Wu Q, Wang Z, Deng F, Chi Z, Feng DD (2013) Realistic human action recognition with multimodal feature selection and fusion. IEEE Trans Syst Man Cybern Syst Hum 43:875–885. https://doi.org/10.1109/TSMCA.2012.2226575

    Article  Google Scholar 

  47. Xie X, Livermore C (2016) A pivot-hinged, multilayer SU-8 micro motion amplifier assembled by a self-aligned approach. Proc IEEE Int Conf Micro Electro Mech Syst 75–78. https://doi.org/10.1109/MEMSYS.2016.7421561

  48. Xie X, Zaitsev Y, Velásquez-García LF, Teller SJ, Livermore C (2014) Scalable, MEMS-enabled, vibrational tactile actuators for high resolution tactile displays. J Micromech Microeng 24:125014. https://doi.org/10.1088/0960-1317/24/12/125014

    Article  Google Scholar 

  49. Yang J, Zhou J, Fan D, Lv H (2016) Design of intelligent recognition system based on gait recognition technology in smart transportation. Multimed Tools Appl 75:17501–17514. https://doi.org/10.1007/s11042-016-3313-6.

    Article  Google Scholar 

  50. Zhang L, Gao Y, Hong C, Feng Y, Zhu J, Cai D (2014) Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE Trans Cybern 44:1408–1419. https://doi.org/10.1109/TCYB.2013.2285219

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Polish National Centre for Research and Development - Applied Research Program under Grant PBS2/B3/21/2013 titled: “Virtual sign language translator.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filip Malawski.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malawski, F., Gałka, J. System for multimodal data acquisition for human action recognition. Multimed Tools Appl 77, 23825–23850 (2018). https://doi.org/10.1007/s11042-018-5696-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5696-z

Keywords

Navigation