Skip to main content
Log in

An embedded implementation of CNN-based hand detection and orientation estimation algorithm

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Argyros, A.A., Lourakis, M.I.: Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: European Conference on Computer Vision, pp. 368–379. Springer (2004)

  2. Chen, Q., Georganas, N.D., Petriu, E.M.: Hand gesture recognition using haar-like features and a stochastic context-free grammar. IEEE Trans. Instrum. Meas. 57(8), 1562–1571 (2008)

    Article  Google Scholar 

  3. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16. Curran Associates Inc., Barcelona, Spain, pp 379–387 (2016)

  4. Deng, X., Zhang, Y., Yang, S., Tan, P., Chang, L., Yuan, Y., Wang, H.: Joint hand detection and rotation estimation using CNN. IEEE Trans. Image Process. 27(4), 1888–1900 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  5. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  6. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

  7. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  8. Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR, vol. 4 (2017)

  9. Huang, Y., Liu, X., Zhang, X., Jin, L.: A pointing gesture based egocentric interaction system: dataset, approach and application. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–23 (2016)

  10. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)

  11. Jones, M., Viola, P.: Robust real-time object detection. Int. J. Comput. Vis. 57(2), 87 (2002)

    Google Scholar 

  12. Le, T.H.N., Quach, K.G., Zhu, C., Duong, C.N., Luu, K., Savvides, M., Center, C.B.: Robust hand detection and classification in vehicles and in the wild. In: Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1203–1210 (2017)

  13. Le, T.H.N., Zhu, C., Zheng, Y., Luu, K., Savvides, M.: Robust hand detection in vehicles. In: 23rd International Conference on Pattern Recognition (ICPR), pp. 573–578. IEEE (2016)

  14. Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577 (2013)

  15. Lin, T.Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)

  16. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016)

  17. Mao, H., Yao, S., Tang, T., Li, B., Yao, J., Wang, Y.: Towards real-time object detection on embedded systems. IEEE Trans. Emerg. Top. Comput. 1, 1–1 (2016)

    Google Scholar 

  18. Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: BMVC, pp. 1–11. Citeseer (2011)

  19. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017)

  20. Pisharady, P.K., Vadakkepat, P., Loh, A.P.: Attention based detection and recognition of hand postures against complex backgrounds. Int. J. Comput. Vis. 101(3), 403–419 (2013)

    Article  Google Scholar 

  21. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, vol 1. MIT Press, Montreal, Canada, pp 91–99 (2015)

  22. Shrivastava, A., Sukthankar, R., Malik, J., Gupta, A.: Beyond skip connections: Top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016)

  23. Stergiopoulou, E., Sgouropoulos, K., Nikolaou, N., Papamarkos, N., Mitianoudis, N.: Real time hand detection in a complex background. Eng. Appl. Artif. Intell. 35, 54–70 (2014)

    Article  Google Scholar 

  24. Wang, C., Wang, Y., Han, Y., Song, L., Quan, Z., Li, J., Li, X.: CNN-based object detection solutions for embedded heterogeneous multicore SoCs. In: 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 105–110. IEEE (2017)

  25. Yu, J., Guo, K., Hu, Y., Ning, X., Qiu, J., Mao, H., Yao, S., Tang, T., Li, B., Wang, Y., et al.: Real-time object detection towards high power efficiency. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 704–708. IEEE (2018)

Download references

Acknowledgements

This research was supported by the Provincial Natural Science Foundation of Jiangsu Province (Grant No. BK20181141), Key Science and Technology Projects in Jiangsu Province (Grant No. BE2018002-2), and the National Science and Technology Major Project (Grant No. 2017-ZX01030101).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi Qi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, L., Qi, Z., Liu, Z. et al. An embedded implementation of CNN-based hand detection and orientation estimation algorithm. Machine Vision and Applications 30, 1071–1082 (2019). https://doi.org/10.1007/s00138-019-01038-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-019-01038-4

Keywords

Navigation