Abstract
Hand gestures are becoming an important part of the communication method between humans and machines in the era of fast-paced urbanization. This paper introduces a new standard dataset for hand gesture recognition, Static HAnd PosturE (SHAPE), with adequate side, variation, and practicality. Compared with the previous datasets, our dataset has more classes, subjects, or scenes than other datasets. In addition, the SHAPE dataset is also one of the first datasets to focus on Asian subjects with Asian hand gestures. The SHAPE dataset contains more than 34,000 images collected from 20 distinct subjects with different clothes and backgrounds. A recognition architecture is also presented to investigate the proposed dataset. The architecture consists of two phases that are the hand detection phase for preprocessing and the classification phase by customized state-of-the-art deep neural network models. This paper investigates not only the high accuracy, but also the lightweight hand gesture recognition models that are suitable for resource-constrained devices such as portable edge devices. The promising application of this study is to create a human–machine interface that solves the problem of insufficient space for a keyboard or a mouse in small devices. Our experiments showed that the proposed architecture could obtain high accuracy with the self-built dataset. Details of our dataset can be seen online at https://users.soict.hust.edu.vn/linhdt/dataset/









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availibility
Details of our dataset can be found online at https://users.soict.hust.edu.vn/linhdt/dataset/. Accessing our dataset is available from the corresponding author upon reasonable request.
References
ALC B, Reyes N, Abastillas M, Piccio A, Susnjak T (2011) A new 2d static hand gesture colour image dataset for asl gestures
Priyal SP, Bora PK (2013) A robust static hand gesture recognition system using geometry based normalizations and krawtchouk moments. Pattern Recogn 46(8):2202–2219. https://doi.org/10.1016/j.patcog.2013.01.033
Pinto RF, Borges CD, Almeida A, Paula IC (2019) Static hand gesture recognition based on convolutional neural networks. J Electr Comput Eng
Zhang Y, Cao C, Cheng J, Lu H (2018) Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimed 20(5):1038–1050. https://doi.org/10.1109/TMM.2018.2808769
Nikouei SY, Chen Y, Song S, Xu R, Choi B-Y, Faughnan TR (2018) Real-time human detection as an edge service enabled by a lightweight cnn. In: 2018 IEEE International Conference on Edge Computing (EDGE), pp 125–129. https://doi.org/10.1109/EDGE.2018.00025
Nikouei SY, Chen Y, Song S, Xu R, Choi B-Y, Faughnan T (2018) Smart surveillance as an edge network service: from harr-cascade, svm to a lightweight CNN. In: 2018 IEEE 4th International Conference on Collaboration and Internet Computing (cic), pp 256–265. https://doi.org/10.1109/CIC.2018.00042
Triesch J, Von Der Malsburg C (1996) Robust classification of hand postures against complex backgrounds. In: Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, pp 170–175. https://doi.org/10.1109/AFGR.1996.557260
Kumar PP, Vadakkepat P, Loh AP (2010) Hand posture and face recognition using a fuzzy-rough approach. Int J HR 7(03):331–356. https://doi.org/10.1142/S0219843610002180
Pisharady PK, Vadakkepat P, Loh AP (2013) Attention based detection and recognition of hand postures against complex backgrounds. Int J Comput Vis 101(3):403–419. https://doi.org/10.1007/s11263-012-0560-5
Baraldi L, Paci F, Serra G, Benini L, Cucchiara R (2014) Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 688–693. https://doi.org/10.1109/CVPRW.2014.107
Dreuw P, Neidle C, Athitsos V, Sclaroff S, Ney H (2008) Benchmark databases for video-based automatic sign language recognition. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco
Just A, Rodriguez Y, Marcel S (2006) Hand posture classification and recognition using the modified census transform. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pp 351–356. https://doi.org/10.1109/FGR.2006.62
Cihan Camgöz N, Koller O, Hadfield S, Bowden R (2020) Sign language transformers: Joint end-to-end sign language recognition and translation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10020–10030. https://doi.org/10.1109/CVPR42600.2020.01004
Messer T (2009) Static hand gesture recognition. University of Fribourg, Switzerland
Rokade US, Doye D, Kokare M (2009) Hand gesture recognition using object based key frame selection. In: 2009 IEEE International Conference on Digital Image Processing, pp 288–291. https://doi.org/10.1109/ICDIP.2009.74
Ren Y, Zhang F (2009) Hand gesture recognition based on meb-svm. In: 2009 International Conference on Embedded Software and Systems, pp 344–349. https://doi.org/10.1109/ICESS.2009.21
Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4645–4653. https://doi.org/10.1109/CVPR.2017.494
Moon G, Lee KM (2020) I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, Proceedings, Part VII 16, pp 752–768. Springer
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708. https://doi.org/10.1109/CVPR.1997.609286
Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
Tan M, Le Q (2019) EfficientNet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp 6105–6114
Tan M, Chen B, Pang R, Vasudevan V, Sandler M, Howard A, Le QV (2019) Mnasnet: Platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2820–2828. https://doi.org/10.1109/CVPR.2019.00293
Pang Y, Yuan Y, Li X, Pan J (2011) Efficient hog human detection. Signal Process 91(4):773–781. https://doi.org/10.1016/j.sigpro.2010.08.010
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587. https://doi.org/10.1109/CVPR.2014.81
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Morera Á, Sánchez Á, Moreno AB, Sappa ÁD, Vélez JF (2020) Ssd vs. yolo for detection of outdoor urban advertising panels under multiple variabilities. Sensors 20(16):4587. https://doi.org/10.3390/s20164587
Shafiee MJ, Chywl B, Li F, Wong A (2017) Fast yolo A fast you only look once system for real-time embedded object detection in video. J Comput Vis Imaging Syst https://doi.org/10.15353/vsnl.v3i1.171
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–6
Wang C-Y, Bochkovskiy A, Liao H-YM (2021) Scaled-YOLOv4: scaling cross stage partial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13029–13038
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
https://users.soict.hust.edu.vn/linhdt/dataset/. Online, last visited (October 2021)
Jiang Z, Zhao L, Li S, Jia Y (2020) Real-time object detection method based on improved yolov4-tiny. arXiv e-prints
Acknowledgements
This work was supported by the cooperation between Hanoi University of Science and Technology and Naver Corporation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declared that they have no conflicts of interest with regard to this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dang, T.L., Nguyen, H.T., Dao, D.M. et al. SHAPE: a dataset for hand gesture recognition. Neural Comput & Applic 34, 21849–21862 (2022). https://doi.org/10.1007/s00521-022-07651-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07651-1