System for multimodal data acquisition for human action recognition

Malawski, Filip; Gałka, Jakub

doi:10.1007/s11042-018-5696-z

System for multimodal data acquisition for human action recognition

Published: 02 February 2018

Volume 77, pages 23825–23850, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Multimodal data is being used more widely for human action recognition nowadays due to the progress of machine learning methods and the development of new types of sensors. The acquisition of the data required by such solutions is often troublesome, and it is difficult to find the proper tools for this process. In this paper, we present a new toolkit for multimodal acquisition. We address and discuss issues concerning the synchronization of data from multiple sensors, the optimization of the initial processing of raw data, and the design of the user interface for efficiently recording large databases. The system was verified in a setup consisting of three types of sensors – a Kinect 2, two PS3Eye cameras, and an accelerometer glove. The accuracy of the synchronization and performance of the initial processing proved to be suitable for human action acquisition and recognition. The system was used for the acquisition of an extensive database of sign language gestures. User feedback indicated the recording process to be efficient, which is also evaluated in the paper. The system is publicly available, both in the form of a standalone application as well as source code, and can be easily customized to any type of sensor setup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Challenges in Multi-modal Gesture Recognition

Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset

A Generic Multi-modal Dynamic Gesture Recognition System Using Machine Learning

Notes

The multimodal data acquisition toolkit is available at https://github.com/fmal-pl/MultiSourceAcquisition.
CL-Eye Platform SDK Homepage: https://codelaboratories.com/products/eye/sdk/

References

Antonakaki P, Kosmopoulos D, Perantonis SJ (2009) Detecting abnormal human behaviour using multiple cameras. Signal Process 89:1723–1738. https://doi.org/10.1016/j.sigpro.2009.03.016
Article MATH Google Scholar
Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. Proc. 6th Int. Conf. Multimodal interfaces - ICMI ‘04 205. https://doi.org/10.1145/1027933.1027968
Chang KI, Bowyer KW, Flynn PJ (2003) Multimodal 2D and 3D biometrics for face recognition. 2003 I.E. Int. SOI Conf. Proc. (cat. No.03CH37443). https://doi.org/10.1109/AMFG.2003.1240842
Chen L, Hoey J, Nugent CD, Cook DJ, Yu Z (2012) Sensor-based activity recognition. IEEE Trans Syst Man Cybern Part C Appl Rev 42:790–808. https://doi.org/10.1109/TSMCC.2012.2198883
Article Google Scholar
Cheung YM, Peng Q (2015) Eye gaze tracking with a web camera in a desktop environment. IEEE Trans Human-Mach Syst 45:419–430. https://doi.org/10.1109/THMS.2015.2400442
Article Google Scholar
Cholewa M, Głomb P (2013) Estimation of the number of states for gesture recognition with hidden Markov models based on the number of critical points in time sequence. Pattern Recogn Lett 34:574–579. https://doi.org/10.1016/j.patrec.2012.12.002
Article Google Scholar
Cholewa M, Głomb P (2015) Natural human gestures classification using multisensor data. 2015 3rd IAPR Asian Conf. Pattern Recogn 499–503
Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low- and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43:996–1002. https://doi.org/10.1109/TSMCA.2012.2223670
Article Google Scholar
Dinh DL, Lee S, Kim TS (2016) Hand number gesture recognition using recognized hand parts in depth images. Multimed Tools Appl 75:1333–1348. https://doi.org/10.1007/s11042-014-2370-y
Article Google Scholar
Ganapathi V, Plagemann C, Koller D, Thrun S (2010) Real time motion capture using a single time-of-flight camera, 2010 I.E. Comput. Soc. Conf. Comput. Vis. Pattern Recogn 755–762. https://doi.org/10.1109/CVPR.2010.5540141.
García J, Gardel A, Bravo I, Lázaro JL, Martínez M (2013) Tracking people motion based on extended condensation algorithm. IEEE Trans Syst Man Cybern Syst Hum 43:606–618. https://doi.org/10.1109/TSMCA.2012.2220540
Article Google Scholar
Gkalelis N, Kim H, Hilton A, Nikolaidis N, Pitas I (2009) The i3DPost multi-view and 3D human action/interaction database. CVMP 2009 - 6th Eur. Conf. Vis. Media prod 159–168. https://doi.org/10.1109/CVMP.2009.19
Hg RI, Jasek P, Rofidal C, Nasrollahi K, Moeslund TB, Tranchet G (2012) An RGB-D database using Microsoft’s kinect for windows for face detection. 2012 eighth Int. Conf. Signal image Technol. Internet based Syst 42–46. https://doi.org/10.1109/SITIS.2012.17
Hoda M, Hoda Y, Hafidh B, El Saddik A (2017) Predicting muscle forces measurements from kinematics data using kinect in stroke rehabilitation. Multimed tools Appl 1–19. doi:https://doi.org/10.1007/s11042-016-4274-5
Holte MB, Tran C, Trivedi MM, Moeslund TB (2012) Human pose estimation and activity recognition from multi-view videos: comparative explorations of recent developments. IEEE J Sel Top Signal Process 6:538–552. https://doi.org/10.1109/JSTSP.2012.2196975
Article Google Scholar
Hou YL, Pang GKH (2011) People counting and human detection in a challenging situation. IEEE Trans Syst Man Cybern Syst Hum 41:24–33. https://doi.org/10.1109/TSMCA.2010.2064299
Article Google Scholar
Hwang BW, Kim S, Lee SW (2006) A full-body gesture database for automatic gesture recognition. FGR 2006 proc. 7th Int. Conf. Autom. Face Gesture Recognit 2006: 243–248. https://doi.org/10.1109/FGR.2006.8
Jaimes A, Sebe N (2007) Multimodal human-computer interaction: a survey. Comput Vis Image Underst 108:116–134. https://doi.org/10.1016/j.cviu.2006.10.019.
Article Google Scholar
Kepski M, Kwolek B, Austvoll I (2012) Fuzzy inference-based reliable fall detection using kinect and accelerometer. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes bioinformatics). 7267 LNAI 266–273. https://doi.org/10.1007/978-3-642-29347-4_31
Krumm J, Harris S, Meyers B, Brumitt B, Hale M, Shafer S (2000) Multi-camera multi-person tracking for easy living. Proc Third IEEE Int Work Vis Surveil. https://doi.org/10.1109/VS.2000.856852.
Kumar P, Gauba H, Pratim Roy P, Prosad Dogra D (2017) A multimodal framework for sensor based sign language recognition. Neurocomputing 259:21–38. https://doi.org/10.1016/j.neucom.2016.08.132
Article Google Scholar
Kwolek B, Kepski M (2015) Improving fall detection by the use of depth sensor and accelerometer. Neurocomputing 168:637–645. https://doi.org/10.1016/j.neucom.2015.05.061
Article Google Scholar
Lazzeri N, Mazzei D, De Rossi D (2014) Development and testing of a multimodal acquisition platform for human-robot interaction affective studies. J Human-Robot Interact 3:1. https://doi.org/10.5898/JHRI.3.2.Lazzeri
Article Google Scholar
Li L, Dai S (2016) Action recognition with spatio-temporal augmented descriptor and fusion method. Multimed Tools Appl. https://doi.org/10.1007/s11042-016-3789-0
Liu Y, Zhang X, Cui J, Wu C, Aghajan H, Zha H (2010) Visual analysis of child-adult interactive behaviors in video sequence. 2010 16th Int. Conf. Virtual Syst. Multimedia, VSMM 2010. 26–33. https://doi.org/10.1109/VSMM.2010.5665969
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115. https://doi.org/10.1016/j.neucom.2015.08.096
Article Google Scholar
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. Proc. 30th Conf. Artif. Intell. (AAAI 2016) 1266–1272
Malawski F (2014) Applying hand gesture recognition with time-of-flight camera for 3D medical data analysis. Challenges Mod Technol 5:12–16
Google Scholar
Malawski F, Kwolek B (2016) Classification of basic footwork in fencing using accelerometer. Signal Process Algorithms Archit Arrange Appl (SPA), IEEE: 51–55. https://doi.org/10.1109/SPA.2016.7763586
Malawski F, Kwolek B, Sako S (2014) Using Kinect for facial expression recognition under varying poses and illumination. Act. Media Technol. 10th Int. Conf. AMT 2014 - Lect. Notes Comput. Sci. 8610 LNCS 395–406. https://doi.org/10.1007/978-3-319-09912-5_33
Mendels O, Stern H, Berman S (2014) User identification for home entertainment based on free-air hand motion signatures. IEEE Trans Syst Man Cybern Syst Hum 44:1461–1473. https://doi.org/10.1109/TSMC.2014.2329652
Article Google Scholar
Mian AS, Bennamoun M, Owens R (2007) An efficient multimodal 2D-3D hybrid approach to automatic face recognition. IEEE Trans Pattern Anal Mach Intell 29:1927–1943. https://doi.org/10.1109/TPAMI.2007.1105
Article Google Scholar
Michel M, Stanford V (2006) Synchronizing multimodal data streams acquired using commodity hardware. Proc. 4th ACM Int. work. Video Surveill. Sens. Networks - VSSN ‘06 3. https://doi.org/10.1145/1178782.1178785
Min R, Kose N, Dugelay J-L (2013) KinectFaceDB: a Kinect database for face recognition. IEEE Trans Syst Man Cybern Syst 44:1534–1548
Article Google Scholar
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley MHAD: a comprehensive multimodal human action database. Proc IEEE Work Appl Comput Vis 53–60. https://doi.org/10.1109/WACV.2013.6474999
Oliver N, Garg A, Horvitz E (2004) Layered representations for learning and inferring office activity from multiple sensory channels. Comput Vis Image Underst 96:163–180. https://doi.org/10.1016/j.cviu.2004.02.004
Article Google Scholar
Pantic M, Rothkrantz LJM (2003) Toward an affect-sensitive multimodal human-computer interaction. Proc IEEE 91:1370–1390. https://doi.org/10.1109/JPROC.2003.817122
Article Google Scholar
Plantard P, Hubert HP, Multon F (2017) Filtered pose graph for efficient kinect pose reconstruction. Multimed Tools Appl 76:4291–4312. https://doi.org/10.1007/s11042-016-3546-4
Article Google Scholar
Premaratne P, Ajaz S, Premaratne M (2013) Hand gesture tracking and recognition system using Lucas–Kanade algorithms for control of consumer electronics. Neurocomputing 116:242–249. https://doi.org/10.1016/j.neucom.2011.11.039
Article Google Scholar
Sako S, Hatano M, Kitamura T (2016) Real-time Japanese sign language recognition based on three phonological elements of sign. In: Int. Conf. Human-computer interact, pp 130–136
Sha T, Song M, Bu J, Chen C, Tao D (2011) Feature level analysis for 3D facial expression recognition. Neurocomputing 74:2135–2141. https://doi.org/10.1016/j.neucom.2011.01.008
Article Google Scholar
Song W, Cai X, Xi Y, Cho S, Cho K (2015) Real-time single camera natural user interface engine development. Multimed Tools Appl:11159–11175. https://doi.org/10.1007/s11042-015-2986-6
Tenorth M, Bandouch J, Beetz M (2009) The TUM kitchen data set of everyday manipulation activities for motion tracking and action recognition. 2009 I.E. 12th Int. Conf Comput Vis Work ICCV Work 1089–1096. https://doi.org/10.1109/ICCVW.2009.5457583
Uddin MZ, Hassan MM (2015) A depth video-based facial expression recognition system using radon transform, generalized discriminant analysis, and hidden Markov model. Multimed Tools Appl 74:3675–3690. https://doi.org/10.1007/s11042-013-1793-1
Article Google Scholar
Vadakkepat P, Lim P, De Silva LC, Jing L, Ling LL (2008) Multimodal approach to human-face detection and tracking. IEEE Trans Ind Electron 55:1385–1393. https://doi.org/10.1109/TIE.2007.903993
Article Google Scholar
Wu Q, Wang Z, Deng F, Chi Z, Feng DD (2013) Realistic human action recognition with multimodal feature selection and fusion. IEEE Trans Syst Man Cybern Syst Hum 43:875–885. https://doi.org/10.1109/TSMCA.2012.2226575
Article Google Scholar
Xie X, Livermore C (2016) A pivot-hinged, multilayer SU-8 micro motion amplifier assembled by a self-aligned approach. Proc IEEE Int Conf Micro Electro Mech Syst 75–78. https://doi.org/10.1109/MEMSYS.2016.7421561
Xie X, Zaitsev Y, Velásquez-García LF, Teller SJ, Livermore C (2014) Scalable, MEMS-enabled, vibrational tactile actuators for high resolution tactile displays. J Micromech Microeng 24:125014. https://doi.org/10.1088/0960-1317/24/12/125014
Article Google Scholar
Yang J, Zhou J, Fan D, Lv H (2016) Design of intelligent recognition system based on gait recognition technology in smart transportation. Multimed Tools Appl 75:17501–17514. https://doi.org/10.1007/s11042-016-3313-6.
Article Google Scholar
Zhang L, Gao Y, Hong C, Feng Y, Zhu J, Cai D (2014) Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE Trans Cybern 44:1408–1419. https://doi.org/10.1109/TCYB.2013.2285219
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Polish National Centre for Research and Development - Applied Research Program under Grant PBS2/B3/21/2013 titled: “Virtual sign language translator.”

Author information

Authors and Affiliations

Department of Computer Science, AGH University of Science and Technology, Krakow, Poland
Filip Malawski
Department of Electronics, AGH University of Science and Technology, Krakow, Poland
Jakub Gałka

Authors

Filip Malawski
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Gałka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Filip Malawski.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malawski, F., Gałka, J. System for multimodal data acquisition for human action recognition. Multimed Tools Appl 77, 23825–23850 (2018). https://doi.org/10.1007/s11042-018-5696-z

Download citation

Received: 22 May 2017
Revised: 17 January 2018
Accepted: 22 January 2018
Published: 02 February 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11042-018-5696-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

System for multimodal data acquisition for human action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Challenges in Multi-modal Gesture Recognition

Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset

A Generic Multi-modal Dynamic Gesture Recognition System Using Machine Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

System for multimodal data acquisition for human action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Challenges in Multi-modal Gesture Recognition

Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset

A Generic Multi-modal Dynamic Gesture Recognition System Using Machine Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation