An embedded implementation of CNN-based hand detection and orientation estimation algorithm

Yang, Li; Qi, Zhi; Liu, Zeheng; Liu, Hao; Ling, Ming; Shi, Longxing; Liu, Xinning

doi:10.1007/s00138-019-01038-4

An embedded implementation of CNN-based hand detection and orientation estimation algorithm

Original Paper
Published: 12 June 2019

Volume 30, pages 1071–1082, (2019)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Li Yang¹,
Zhi Qi¹,
Zeheng Liu¹,
Hao Liu¹,
Ming Ling¹,
Longxing Shi¹ &
…
Xinning Liu¹

1122 Accesses
22 Citations
Explore all metrics

Abstract

Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hand Detection Using Zoomed Neural Networks

End-to-end bare-hand localization system for human–computer interaction: a comprehensive analysis and viable solution

Article 03 April 2023

Multi-fusion feature pyramid for real-time hand detection

Article 03 March 2022

References

Argyros, A.A., Lourakis, M.I.: Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: European Conference on Computer Vision, pp. 368–379. Springer (2004)
Chen, Q., Georganas, N.D., Petriu, E.M.: Hand gesture recognition using haar-like features and a stochastic context-free grammar. IEEE Trans. Instrum. Meas. 57(8), 1562–1571 (2008)
Article Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16. Curran Associates Inc., Barcelona, Spain, pp 379–387 (2016)
Deng, X., Zhang, Y., Yang, S., Tan, P., Chang, L., Yuan, Y., Wang, H.: Joint hand detection and rotation estimation using CNN. IEEE Trans. Image Process. 27(4), 1888–1900 (2018)
Article MathSciNet MATH Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR, vol. 4 (2017)
Huang, Y., Liu, X., Zhang, X., Jin, L.: A pointing gesture based egocentric interaction system: dataset, approach and application. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–23 (2016)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Jones, M., Viola, P.: Robust real-time object detection. Int. J. Comput. Vis. 57(2), 87 (2002)
Google Scholar
Le, T.H.N., Quach, K.G., Zhu, C., Duong, C.N., Luu, K., Savvides, M., Center, C.B.: Robust hand detection and classification in vehicles and in the wild. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1203–1210 (2017)
Le, T.H.N., Zhu, C., Zheng, Y., Luu, K., Savvides, M.: Robust hand detection in vehicles. In: 23rd International Conference on Pattern Recognition (ICPR), pp. 573–578. IEEE (2016)
Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577 (2013)
Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)
Mao, H., Yao, S., Tang, T., Li, B., Yao, J., Wang, Y.: Towards real-time object detection on embedded systems. IEEE Trans. Emerg. Top. Comput. 1, 1–1 (2016)
Google Scholar
Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: BMVC, pp. 1–11. Citeseer (2011)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017)
Pisharady, P.K., Vadakkepat, P., Loh, A.P.: Attention based detection and recognition of hand postures against complex backgrounds. Int. J. Comput. Vis. 101(3), 403–419 (2013)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, vol 1. MIT Press, Montreal, Canada, pp 91–99 (2015)
Shrivastava, A., Sukthankar, R., Malik, J., Gupta, A.: Beyond skip connections: Top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016)
Stergiopoulou, E., Sgouropoulos, K., Nikolaou, N., Papamarkos, N., Mitianoudis, N.: Real time hand detection in a complex background. Eng. Appl. Artif. Intell. 35, 54–70 (2014)
Article Google Scholar
Wang, C., Wang, Y., Han, Y., Song, L., Quan, Z., Li, J., Li, X.: CNN-based object detection solutions for embedded heterogeneous multicore SoCs. In: 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 105–110. IEEE (2017)
Yu, J., Guo, K., Hu, Y., Ning, X., Qiu, J., Mao, H., Yao, S., Tang, T., Li, B., Wang, Y., et al.: Real-time object detection towards high power efficiency. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 704–708. IEEE (2018)

Download references

Acknowledgements

This research was supported by the Provincial Natural Science Foundation of Jiangsu Province (Grant No. BK20181141), Key Science and Technology Projects in Jiangsu Province (Grant No. BE2018002-2), and the National Science and Technology Major Project (Grant No. 2017-ZX01030101).

Author information

Authors and Affiliations

National ASIC System Engineering Research Center, Southeast University, Nanjing, China
Li Yang, Zhi Qi, Zeheng Liu, Hao Liu, Ming Ling, Longxing Shi & Xinning Liu

Authors

Li Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Qi
View author publications
You can also search for this author in PubMed Google Scholar
Zeheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ming Ling
View author publications
You can also search for this author in PubMed Google Scholar
Longxing Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xinning Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi Qi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L., Qi, Z., Liu, Z. et al. An embedded implementation of CNN-based hand detection and orientation estimation algorithm. Machine Vision and Applications 30, 1071–1082 (2019). https://doi.org/10.1007/s00138-019-01038-4

Download citation

Received: 01 August 2018
Revised: 09 May 2019
Accepted: 23 May 2019
Published: 12 June 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s00138-019-01038-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An embedded implementation of CNN-based hand detection and orientation estimation algorithm

Abstract

Access this article

Similar content being viewed by others

Hand Detection Using Zoomed Neural Networks

End-to-end bare-hand localization system for human–computer interaction: a comprehensive analysis and viable solution

Multi-fusion feature pyramid for real-time hand detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An embedded implementation of CNN-based hand detection and orientation estimation algorithm

Abstract

Access this article

Similar content being viewed by others

Hand Detection Using Zoomed Neural Networks

End-to-end bare-hand localization system for human–computer interaction: a comprehensive analysis and viable solution

Multi-fusion feature pyramid for real-time hand detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation