Abstract
Dynamic hand gesture recognition is a desired alternative means for human-computer interactions. This paper presents a hand gesture recognition system that is designed for the control of flights of unmanned aerial vehicles (UAV). A data representation model that represents a dynamic gesture sequence by converting the 4-D spatiotemporal data to 2-D matrix and a 1-D array is introduced. To train the system to recognize designed gestures, skeleton data collected from a Leap Motion Controller are converted to two different data models. As many as 9 124 samples of the training dataset, 1 938 samples of the testing dataset are created to train and test the proposed three deep learning neural networks, which are a 2-layer fully connected neural network, a 5-layer fully connected neural network and an 8-layer convolutional neural network. The static testing results show that the 2-layer fully connected neural network achieves an average accuracy of 96.7% on scaled datasets and 12.3% on non-scaled datasets. The 5-layer fully connected neural network achieves an average accuracy of 98.0% on scaled datasets and 89.1% on non-scaled datasets. The 8-layer convolutional neural network achieves an average accuracy of 89.6% on scaled datasets and 96.9% on non-scaled datasets. Testing on a drone-kit simulator and a real drone shows that this system is feasible for drone flight controls.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
S. Mitra, T. Acharya. Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 32, no. 3, pp. 311–324, 2007. DOI: https://doi.org/10.1109/TSMCC.2007.893280.
V. I. Pavlovic, R. Sharma, T. S. Huang. Visual interpretation of hand gestures for human-computer interaction: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 677–695, 1997. DOI: https://doi.org/10.1109/34.598226.
B. Raj, K. Kalgaonkar, C. Harrison, P. Dietz. Ultrasonic Doppler sensing in HCI. IEEE Pervasive Computing, vol. 11, no. 2, pp. 24–29, 2012. DOI: https://doi.org/10.1109/MPRV.2012.17.
C. Oz, M. C. Leu. Human-computer interaction system with artificial neural network using motion tracker and data glove. In Proceedings of the 1st International Conference on Pattern Recognition and Machine Intelligence, Springer, Kolkata, India, pp. 280–286, 2005. DOI: https://doi.org/10.1007/11590316_40.
O. Aran. Vision Based Sign Language Recognition: Modeling and Recognizing Isolated Signs with Manual and Non-manual Components, Ph. D. dissertation, Bogazici University, Turkey, 2008.
S. Mitra, T. Acharya. Gesture recognition: A survey. IEEE Transactions on Systems, Man, and Cybernetics, vol. 37, no. 3, pp. 311–324, 2007. DOI: https://doi.org/10.1109/TSMCC.2007.893280.
C. Z. Li, V. M. Lubecke, O. Boric-Lubecke, J. Lin. A review on recent advances in Doppler radar sensors for non-contact healthcare monitoring. IEEE Transactions on Microwave Theory and Techniques, vol. 61, no. 5, pp. 2046–2060, 2013. DOI: https://doi.org/10.1109/TMTT.2013.2256924.
C. Z. Gu, C. Z. Li, J. Lin, J. Long, J. T. Huangfu, L. X. Ran. Instrument-based noncontact Doppler radar vital sign detection system using heterodyne digital quadrature demodulation architecture. IEEE Transactions on Instrumentation and Measurement, vol. 59, no. 6, pp. 1580–1588, 2010. DOI: https://doi.org/10.1109/TIM.2009.2028208.
T. Starner, A. Pentland. Real-time American Sign Language recognition from video using hidden Markov models. Motion-based Recognition, M. Shah, R. Jain, Eds., Dordrecht, Netherlands: Springer, pp. 227–243, 1997. DOI: https://doi.org/10.1007/978-94-015-8935-2_10.
F. Weichert, D. Bachmann, B. Rudak, D. Fisseler. Analysis of the accuracy and robustness of the Leap Motion Controller. Sensors, vol. 13, no. 5, pp. 6380–6393, 2013. DOI: https://doi.org/10.3390/s130506380.
J. Guna, G. Jakus, M. Pogačnik, S. Tomažič, J. Sodnik. An analysis of the precision and reliability of the Leap Motion. Sensors, vol. 14, no. 2, pp. 3702–3720, 2014. DOI: https://doi.org/10.3390/s140203702.
D. Y. Huang, W. C. Hu, S. H. Chang. Gabor filter-based hand-pose angle estimation for hand gesture recognition under varying illumination. Expert Systems with Applications, vol. 38, no. 5, pp. 6031–6042, 2011. DOI: https://doi.org/10.1016/j.2010.11.016.
G. Rigoll, A. Kosmala, S. Eickeler. High performance real-time gesture recognition using hidden Markov models. In Proceedings of International Gesture Workshop on Gesture and Sign Language in Human-computer Interaction, Springer, Berlin, Germany, pp. 69–80, 1998. DOI: https://doi.org/10.1007/BFb0052990.
C. Nolker, H. Ritter. Visual recognition of continuous hand postures. IEEE Transactions on Neural Networks, vol. 13, no. 4, pp. 983–994, 2002. DOI: https://doi.org/10.1109/TNN.2002.1021898.
Z. Yang, Y. Li, W. D. Chen, Y. Zheng. Dynamic hand gesture recognition using hidden Markov models. In Proceedings of the 7th International Conference on Computer Science & Education, IEEE, Melbourne, Australia, pp. 360–365, 2012. DOI: https://doi.org/10.1109/ICCSE.2012.6295092.
D. J. Li, Y. Y. Li, J. X. Li, Y. Fu. Gesture recognition based on BP neural network improved by chaotic genetic algorithm. International Journal of Automation and Computing, vol. 15, no. 3, pp. 267–276, 2018. DOI: https://doi.org/10.1007/s11633-017-1107-6.
O. Koller, S. Zargaran, H. Ney, R. Bowden. Deep sign: Hybrid CNN-HMM for continuous sign language recognition. In Proceeding of British Machine Vision Conference, BMVA Press, York, UK, pp. 1–12, 2016.
H. Cooper, E. J. Ong, N. Pugeault, R. Bowden. Sign Language recognition using sub-units. The Journal of Machine Learning Research, vol. 13, no. 1, pp. 2205–2231, 2012.
R. D. Yang, S. Sarkar. Gesture recognition using hidden Markov models from fragmented observations. In Proceeding of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, New York, USA, pp. 766–773, 2006. DOI: https://doi.org/10.1109/CVPR.2006.126.
C. Keskin, A. Erkan, L. Akarun. Real time gestural interface for generic applications. In Proceedings of the 13th European Signal Processing Conference, IEEE, Antalya, Turkey, pp. 1–4, 2005.
S. B. Wang, A. Quattoni, L. P. Morency, D. Demirdjian, T. Darrell. Hidden conditional random fields for gesture recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, New York, USA, pp. 1521–1527, 2006. DOI: https://doi.org/10.1109/CVPR.2006.132.
T. Ishihara, N. Otsu. Gesture recognition using auto-regressive coefficients of higher-order local auto-correlation features. In Proceedings of the 6th IEEE International Conference on Automatic Face and Gesture Recognition, IEEE, Seoul, South Korea, pp. 583–588, 2004. DOI: https://doi.org/10.1109/AFGR.2004.1301596.
A. Ghotkar, P. Vidap, K. Deo. Dynamic hand gesture recognition using hidden Markov model by Microsoft kinect sensor. International Journal of Computer Applications, vol. 150, no. 5, pp. 5–9, 2016. DOI: https://doi.org/10.5120/ijca2016911498.
O. Bimber. Continuous DOF gesture recognition: A fuzzy logic approach. In Proceedings of the 7th International Conference in Central Europe on Computer Graphics and Visualization and Digital Interactive Media, University of West Bohemia, Plzen, Czech Republic, pp. 24–30, 1999.
A. Ramamoorthy, N. Vaswani, S. Chaudhury, S. Banerjee. Recognition of dynamic hand gestures. Pattern Recognition, vol. 36, no. 9, pp. 2069–2081, 2003. DOI: https://doi.org/10.1016/S0031-3203(03)00042-6.
N. H. Dardas, N. D. Georganas. Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Transactions on Instrumentation and Measurement, vol. 60, no. 11, pp. 3592–3607, 2011. DOI: https://doi.org/10.1109/TIM.2011.2161140.
L. Pigou, A. van den Oord, S. Dieleman, M. van Herreweghe, J. Dambre. Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video. International Journal of Computer Vision, vol. 126, no. 2–4, pp. 430–439, 2018. DOI: https://doi.org/10.1007/s11263-016-0957-7.
X. J. Chai, Z. P. Liu, F. Yin, Z. Liu, X. L. Chen. Two streams recurrent neural networks for large-scale continuous gesture recognition. In Proceedings of the 23rd International Conference on Pattern Recognition, IEEE, Cancun, Mexico, pp. 31–36, 2016. DOI: https://doi.org/10.1109/ICPR.2016.7899603.
R. M. Tan, Y. Cao. Multi-layer contribution propagation analysis for fault diagnosis. International Journal of Automation and Computing, vol. 16, no. 1, pp. 40–51, 2019. DOI: https://doi.org/10.1007/s11633-018-1142-y.
N. Neverova, C. Wolf, G. Taylor, F. Nebout. ModDrop: Adaptive multi-modal gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 8, pp. 1692–1706, 2016. DOI: https://doi.org/10.1109/TPAMI.2015.2461544.
P. Molchanov, S. Gupta, K. Kim, J. Kautz. Hand gesture recognition with 3D convolutional neural networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Boston, USA, pp. 1–7, 2015. DOI: https://doi.org/10.1109/CVPRW.2015.7301342.
A. Krizhevsky, I. Sutskever, G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, ACM, Lake Tahoe, USA, pp. 1097–1105, 2012.
Y. Le Cun, B. Boser, J. S. Denker, R. E. Howard, W. Habbard, L. D. Jackel, D. Henderson. Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems 2, D. S. Touretzky, Ed., San Francisco, USA: Morgan Kaufmann Publishers, pp. 396–404, 1989.
B. Hu, J. C. Wang. Deep learning based hand gesture recognition and UAV flight controls. In Proceedings of the 24th International Conference on Automation and Computing, IEEE, Newcastle upon Tyne, UK, 2018. DOI: https://doi.org/10.23919/IConAC.2018.8748953.
G. Zaccone, R. Karim, A. Menshawy. Deep Learning with TensorFlow, Birmingham, UK: Packt Publishing, pp. 8–28, 2017.
Y. Kim. Convolutional neural networks for sentence classification. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Doha, Qata, pp. 1746–1751, 2014.
O. Abdel-Hamid, A. R. Mohamed, H. Jiang, L. Deng, G. Penn, D. Yu. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 10, pp. 1533–1545, 2014. DOI: https://doi.org/10.1109/TASLP.2014.2339736.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor Xian-Dong Ma
Bin Hu received the B. Sc. degree in mechanical engineering from Xi’an Jiaotong University, China in 2000. He received the two M. Sc. degrees in software engineering from Xi’an Jiaotong University, China in 2005 and Monmouth University, USA in 2018, respectively. From 2006 to 2016, he was an assistant professor at Xi’an University of Posts and Telecommunications, China. Currently, he is an adjunct professor in Department of Computer Science at New Jersey City University, USA. He published about 10 research papers in journals and conferences.
His research interests include software engineering, robotics, and wireless networking.
Jiacun Wang received the Ph. D. degree in computer engineering from Nanjing University of Science and Technology (NUST), China in 1991. He is currently a professor of software engineering at Monmouth University, USA. From 2001 to 2004, he was a member of scientific staff with Nortel Networks in Richardson, USA. Prior to joining Nortel, he was a research associate of the School of Computer Science, Florida International University (FIU) at Miami, USA. Prior to joining FIU, he was an associate professor at NUST, China. He authored Timed Petri Nets: Theory and Application (Kluwer, 1998), Real-time Embedded Systems (Wiley, 2018) and Formal Methods in Computer Science (CRC Press, 2019), edited Handbook of Finite Stat Based Models and Applications (CRC, 2012), and published about 90 research papers in journals and conferences. He was an Associate Editor of IEEE Transactions on Systems, Man and Cybernetics, Part C. He has served as general chair, program chair, and special sessions chair or program committee member for many international conferences. He is a senior member of IEEE.
His research interests include software engineering, discrete event systems, formal methods, wireless networking, and real-time distributed systems.
Rights and permissions
About this article
Cite this article
Hu, B., Wang, J. Deep Learning Based Hand Gesture Recognition and UAV Flight Controls. Int. J. Autom. Comput. 17, 17–29 (2020). https://doi.org/10.1007/s11633-019-1194-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-019-1194-7