Multi-scale predictions fusion for robust hand detection and classification

Ding, Lu; Wang, Yong; Laganière, Robert; Luo, Xinbin; Fu, Shan

doi:10.1007/s11042-019-08080-4

Multi-scale predictions fusion for robust hand detection and classification

Published: 20 August 2019

Volume 78, pages 35633–35650, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lu Ding¹,
Yong Wang ORCID: orcid.org/0000-0001-6559-9550²,
Robert Laganière²,
Xinbin Luo³ &
…
Shan Fu³

307 Accesses
Explore all metrics

Abstract

In this paper, we present a multi-scale predictions fusion region-based Fully Convolutional Networks (MSPF-RFCN) to robustly detect and classify human hands under various challenging conditions. In our approach, the input image is passed through the proposed network to generate score maps, based on multi-scale predictions fusion. The network has been specifically designed to deal with small objects. It uses an architecture based on region proposals generated at multiple scales. Our method is evaluated on challenging hand datasets, namely the Vision for Intelligent Vehicles and Applications (VIVA) Challenge and the Oxford hand dataset. It is compared against recent hand detection algorithms. The experimental results demonstrate that our proposed method achieves state-of-the-art detection for hands of various sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hand Detection Based on Multi-scale Fully Convolutional Networks

Hand Pose Estimation Using Convolutional Neural Networks and Support Vector Regression

Hand Detection and Location Based on Improved SSD for Space Human-Robot Interaction

References

Bambach S, Crandall D, Yu C (2015) Viewpoint integration for hand-based recognition of social interactions from a firstperson view. In: Proceedings of the 17th ACM international conference on multimodal interaction (ICMI), pp 351–354
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Das N, Ohn-Bar E, Trivedi M (2015) On performance evaluation of driver hand detection algorithms: challenges, dataset, and metrics. In: IEEE conference intelligent transportation systems, pp 2953–2958
Das N, Ohn-Bar E, Trivedi MM (2015) On performance evaluation of driver hand detection algorithms: challenges, dataset, and metrics. In: 2015 IEEE 18th international conference on intelligent transportation systems (ITSC). IEEE, pp 2953–2958
Dollar P, Appel R, Belongie S, Perona Pietro (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Article Google Scholar
Dollar P, Tu Z, Perona P, Belongie S (2009) Integral channel features, BMVC
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell JMT (2015) Region-based convolutional networks for accurate object detection and semantic segmentation. IEEE Transactions on PAMI
Hcii lab scut
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. http://cvrr.ucsd.edu/vivachallenge/index.php/hands/hand-detection/
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: IEEE international conference on computer vision (ICCV). IEEE, p 2017
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding, arXiv:1408.5093
Le THN, Zhu C, Zheng Y, Luu K, Savvides M (2016) Robust hand detection in vehicles. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 573–578
Le THN, Quach KG, Zhu C, Duong CN, Luu K, Savvides M (2017) Robust hand detection and classification in vehicles and in the wild. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1203–1210
Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR, 1(2)
Liu W, Rabinovich A, Berg AC (2015) Parsenet: Looking wider to see better, arXiv:1506.04579
Liu D, Du D, Zhang L, Luo T, Wu Y, Huang F, Lyu S (2019) Scale invariant fully convolutional network: detecting hands efficiently, arXiv:1906.04634
Mittal A, Zisserman A, Torr PHS (2011) Hand detection using multiple proposals. In: British machine vision conference
Ren S, He K, Girshick R, Sun J (2015) Faster r-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Santosh D, Girshick R, Redmon J, Farhadi A (2016) You only look once: unified, real-time object detection. In: CVPR
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556
Shelhamer E, Long J, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
The vision for intelligent vehicles and applications (VIVA) challenge, Laboratory for Intelligent and Safe Automobiles, UCSD. http://cvrr.ucsd.edu/vivachallenge/
Verbickas Rytis, Laganiere Robert, Laroche Daniel, Zhu Changyun, Xiaoyin X u, Ors Ali (2017) Squeezemap: fast pedestrian detection on a low-power automotive processor using efficient convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 146–154
Xu Y, Lu Y (2015) Adaptive weighted fusion: a novel fusion approach for image classification. Neurocomputing 168:566–574
Article Google Scholar
Xu Y, Zhong Z, Yang J, You J, Zhang D (2017) A new discriminative sparse representation method for robust face recognition via l(2) regularization. IEEE Transactions on Neural Networks and Learning Systems 28(10):2233–2242
Article MathSciNet Google Scholar
Yan C, Xie H, Chen J, Zha Z, Hao X, Zhang Y, Dai Q (2018) A fast Uyghur text detector for complex background images. IEEE Trans Multimedia 20 (12):3389–3398
Article Google Scholar
Yan C, Li L, Zhang C, Liu B, Zhang Y, Dai Q (2019), Cross-modality bridging and knowledge transferring for image understanding. IEEE Trans Multimedia. https://ieeexplore.ieee.org/document/8662712
Yan C, Tu Y, Wang X, Zhang Y, Hao X, Zhang Y, Dai Q (2019) STAT: spatial-temporal attention mechanism for video captioning. IEEE Trans Multimedia. https://ieeexplore.ieee.org/document/8744407
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Cham, pp 818–833
Chapter Google Scholar
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Zhou T, Pillai PJ, Yalla VG (2016) Hierarchical context-aware hand detection algorithm for naturalistic driving. In: IEEE 19th international conference on intelligent transportation systems (ITSC). IEEE, pp 1291–1297

Download references

Author information

Authors and Affiliations

School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai, 200240, China
Lu Ding
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, K1N 6N5, Canada
Yong Wang & Robert Laganière
School of Electronic information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Xinbin Luo & Shan Fu

Authors

Lu Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Robert Laganière
View author publications
You can also search for this author in PubMed Google Scholar
Xinbin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Shan Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, L., Wang, Y., Laganière, R. et al. Multi-scale predictions fusion for robust hand detection and classification. Multimed Tools Appl 78, 35633–35650 (2019). https://doi.org/10.1007/s11042-019-08080-4

Download citation

Received: 17 December 2018
Revised: 17 July 2019
Accepted: 02 August 2019
Published: 20 August 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11042-019-08080-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale predictions fusion for robust hand detection and classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hand Detection Based on Multi-scale Fully Convolutional Networks

Hand Pose Estimation Using Convolutional Neural Networks and Support Vector Regression

Hand Detection and Location Based on Improved SSD for Space Human-Robot Interaction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-scale predictions fusion for robust hand detection and classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hand Detection Based on Multi-scale Fully Convolutional Networks

Hand Pose Estimation Using Convolutional Neural Networks and Support Vector Regression

Hand Detection and Location Based on Improved SSD for Space Human-Robot Interaction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation