Skip to main content

Advertisement

Log in

Multi-scale predictions fusion for robust hand detection and classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we present a multi-scale predictions fusion region-based Fully Convolutional Networks (MSPF-RFCN) to robustly detect and classify human hands under various challenging conditions. In our approach, the input image is passed through the proposed network to generate score maps, based on multi-scale predictions fusion. The network has been specifically designed to deal with small objects. It uses an architecture based on region proposals generated at multiple scales. Our method is evaluated on challenging hand datasets, namely the Vision for Intelligent Vehicles and Applications (VIVA) Challenge and the Oxford hand dataset. It is compared against recent hand detection algorithms. The experimental results demonstrate that our proposed method achieves state-of-the-art detection for hands of various sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Bambach S, Crandall D, Yu C (2015) Viewpoint integration for hand-based recognition of social interactions from a firstperson view. In: Proceedings of the 17th ACM international conference on multimodal interaction (ICMI), pp 351–354

  2. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  3. Das N, Ohn-Bar E, Trivedi M (2015) On performance evaluation of driver hand detection algorithms: challenges, dataset, and metrics. In: IEEE conference intelligent transportation systems, pp 2953–2958

  4. Das N, Ohn-Bar E, Trivedi MM (2015) On performance evaluation of driver hand detection algorithms: challenges, dataset, and metrics. In: 2015 IEEE 18th international conference on intelligent transportation systems (ITSC). IEEE, pp 2953–2958

  5. Dollar P, Appel R, Belongie S, Perona Pietro (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Article  Google Scholar 

  6. Dollar P, Tu Z, Perona P, Belongie S (2009) Integral channel features, BMVC

  7. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  8. Girshick R, Donahue J, Darrell JMT (2015) Region-based convolutional networks for accurate object detection and semantic segmentation. IEEE Transactions on PAMI

  9. Hcii lab scut

  10. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. http://cvrr.ucsd.edu/vivachallenge/index.php/hands/hand-detection/

  11. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: IEEE international conference on computer vision (ICCV). IEEE, p 2017

  12. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding, arXiv:1408.5093

  13. Le THN, Zhu C, Zheng Y, Luu K, Savvides M (2016) Robust hand detection in vehicles. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 573–578

  14. Le THN, Quach KG, Zhu C, Duong CN, Luu K, Savvides M (2017) Robust hand detection and classification in vehicles and in the wild. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1203–1210

  15. Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR, 1(2)

  16. Liu W, Rabinovich A, Berg AC (2015) Parsenet: Looking wider to see better, arXiv:1506.04579

  17. Liu D, Du D, Zhang L, Luo T, Wu Y, Huang F, Lyu S (2019) Scale invariant fully convolutional network: detecting hands efficiently, arXiv:1906.04634

  18. Mittal A, Zisserman A, Torr PHS (2011) Hand detection using multiple proposals. In: British machine vision conference

  19. Ren S, He K, Girshick R, Sun J (2015) Faster r-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  20. Santosh D, Girshick R, Redmon J, Farhadi A (2016) You only look once: unified, real-time object detection. In: CVPR

  21. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556

  22. Shelhamer E, Long J, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  23. The vision for intelligent vehicles and applications (VIVA) challenge, Laboratory for Intelligent and Safe Automobiles, UCSD. http://cvrr.ucsd.edu/vivachallenge/

  24. Verbickas Rytis, Laganiere Robert, Laroche Daniel, Zhu Changyun, Xiaoyin X u, Ors Ali (2017) Squeezemap: fast pedestrian detection on a low-power automotive processor using efficient convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 146–154

  25. Xu Y, Lu Y (2015) Adaptive weighted fusion: a novel fusion approach for image classification. Neurocomputing 168:566–574

    Article  Google Scholar 

  26. Xu Y, Zhong Z, Yang J, You J, Zhang D (2017) A new discriminative sparse representation method for robust face recognition via l(2) regularization. IEEE Transactions on Neural Networks and Learning Systems 28(10):2233–2242

    Article  MathSciNet  Google Scholar 

  27. Yan C, Xie H, Chen J, Zha Z, Hao X, Zhang Y, Dai Q (2018) A fast Uyghur text detector for complex background images. IEEE Trans Multimedia 20 (12):3389–3398

    Article  Google Scholar 

  28. Yan C, Li L, Zhang C, Liu B, Zhang Y, Dai Q (2019), Cross-modality bridging and knowledge transferring for image understanding. IEEE Trans Multimedia. https://ieeexplore.ieee.org/document/8662712

  29. Yan C, Tu Y, Wang X, Zhang Y, Hao X, Zhang Y, Dai Q (2019) STAT: spatial-temporal attention mechanism for video captioning. IEEE Trans Multimedia. https://ieeexplore.ieee.org/document/8744407

  30. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Cham, pp 818–833

    Chapter  Google Scholar 

  31. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212

  32. Zhou T, Pillai PJ, Yalla VG (2016) Hierarchical context-aware hand detection algorithm for naturalistic driving. In: IEEE 19th international conference on intelligent transportation systems (ITSC). IEEE, pp 1291–1297

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, L., Wang, Y., Laganière, R. et al. Multi-scale predictions fusion for robust hand detection and classification. Multimed Tools Appl 78, 35633–35650 (2019). https://doi.org/10.1007/s11042-019-08080-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08080-4

Keywords

Navigation