Abstract
In this paper, we present a multi-scale predictions fusion region-based Fully Convolutional Networks (MSPF-RFCN) to robustly detect and classify human hands under various challenging conditions. In our approach, the input image is passed through the proposed network to generate score maps, based on multi-scale predictions fusion. The network has been specifically designed to deal with small objects. It uses an architecture based on region proposals generated at multiple scales. Our method is evaluated on challenging hand datasets, namely the Vision for Intelligent Vehicles and Applications (VIVA) Challenge and the Oxford hand dataset. It is compared against recent hand detection algorithms. The experimental results demonstrate that our proposed method achieves state-of-the-art detection for hands of various sizes.










Similar content being viewed by others
References
Bambach S, Crandall D, Yu C (2015) Viewpoint integration for hand-based recognition of social interactions from a firstperson view. In: Proceedings of the 17th ACM international conference on multimodal interaction (ICMI), pp 351–354
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Das N, Ohn-Bar E, Trivedi M (2015) On performance evaluation of driver hand detection algorithms: challenges, dataset, and metrics. In: IEEE conference intelligent transportation systems, pp 2953–2958
Das N, Ohn-Bar E, Trivedi MM (2015) On performance evaluation of driver hand detection algorithms: challenges, dataset, and metrics. In: 2015 IEEE 18th international conference on intelligent transportation systems (ITSC). IEEE, pp 2953–2958
Dollar P, Appel R, Belongie S, Perona Pietro (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Dollar P, Tu Z, Perona P, Belongie S (2009) Integral channel features, BMVC
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell JMT (2015) Region-based convolutional networks for accurate object detection and semantic segmentation. IEEE Transactions on PAMI
Hcii lab scut
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. http://cvrr.ucsd.edu/vivachallenge/index.php/hands/hand-detection/
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: IEEE international conference on computer vision (ICCV). IEEE, p 2017
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding, arXiv:1408.5093
Le THN, Zhu C, Zheng Y, Luu K, Savvides M (2016) Robust hand detection in vehicles. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 573–578
Le THN, Quach KG, Zhu C, Duong CN, Luu K, Savvides M (2017) Robust hand detection and classification in vehicles and in the wild. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1203–1210
Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR, 1(2)
Liu W, Rabinovich A, Berg AC (2015) Parsenet: Looking wider to see better, arXiv:1506.04579
Liu D, Du D, Zhang L, Luo T, Wu Y, Huang F, Lyu S (2019) Scale invariant fully convolutional network: detecting hands efficiently, arXiv:1906.04634
Mittal A, Zisserman A, Torr PHS (2011) Hand detection using multiple proposals. In: British machine vision conference
Ren S, He K, Girshick R, Sun J (2015) Faster r-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Santosh D, Girshick R, Redmon J, Farhadi A (2016) You only look once: unified, real-time object detection. In: CVPR
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556
Shelhamer E, Long J, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
The vision for intelligent vehicles and applications (VIVA) challenge, Laboratory for Intelligent and Safe Automobiles, UCSD. http://cvrr.ucsd.edu/vivachallenge/
Verbickas Rytis, Laganiere Robert, Laroche Daniel, Zhu Changyun, Xiaoyin X u, Ors Ali (2017) Squeezemap: fast pedestrian detection on a low-power automotive processor using efficient convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 146–154
Xu Y, Lu Y (2015) Adaptive weighted fusion: a novel fusion approach for image classification. Neurocomputing 168:566–574
Xu Y, Zhong Z, Yang J, You J, Zhang D (2017) A new discriminative sparse representation method for robust face recognition via l(2) regularization. IEEE Transactions on Neural Networks and Learning Systems 28(10):2233–2242
Yan C, Xie H, Chen J, Zha Z, Hao X, Zhang Y, Dai Q (2018) A fast Uyghur text detector for complex background images. IEEE Trans Multimedia 20 (12):3389–3398
Yan C, Li L, Zhang C, Liu B, Zhang Y, Dai Q (2019), Cross-modality bridging and knowledge transferring for image understanding. IEEE Trans Multimedia. https://ieeexplore.ieee.org/document/8662712
Yan C, Tu Y, Wang X, Zhang Y, Hao X, Zhang Y, Dai Q (2019) STAT: spatial-temporal attention mechanism for video captioning. IEEE Trans Multimedia. https://ieeexplore.ieee.org/document/8744407
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Cham, pp 818–833
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Zhou T, Pillai PJ, Yalla VG (2016) Hierarchical context-aware hand detection algorithm for naturalistic driving. In: IEEE 19th international conference on intelligent transportation systems (ITSC). IEEE, pp 1291–1297
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ding, L., Wang, Y., Laganière, R. et al. Multi-scale predictions fusion for robust hand detection and classification. Multimed Tools Appl 78, 35633–35650 (2019). https://doi.org/10.1007/s11042-019-08080-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08080-4