Abstract
Hand gesture recognition is one of the most popular Human Computer Interface. The first step in most vision-based gesture recognition system is the hand detection and segmentation. Since hands are involved in a variety of daily tasks, the detection work suffers from both extreme illumination changes and the intrinsic variability of hand appearance. To overcome these problems, we propose a new method for 2D hand detection which can be considered as a combination of Multi-Feature based hand proposal generation and cascaded convolutional neural network (CCNN) classification. Considered various luminance, we choose color, Gabor, HOG and SIFT feature to discriminate skin region and generate hand proposal. Also, we propose a cascaded CNN that keeps the deep context information to detect hand among the proposals. The proposed Multi-Feature Supervised Cascaded CNN (MFS-CCNN) method is tested on a combination of several datasets including Oxford Hands Dataset, VIVA hand detection and Egohands Dataset as positive sample and ImageNet 2012, FDDB dataset as negative sample. The proposed method achieves competitive results.
Similar content being viewed by others
Change history
19 November 2019
The Publisher regrets an error on the printed front cover of the October 2019 issue. The issue numbers were incorrectly listed as Volume 91, Nos. 10-12, October 2019. The correct number should be: "Volume 91, No. 10, October 2019"
References
Stergiopoulou, E., Sgouropoulos, K., Nikolaou, N., Papamarkos, N., & Mitianoudis, N. (2014). Real time hand detection in a complex background. Engineering Applications of Artificial Intelligence, 35(2), 54–70.
Ebert, A., Gershon, N. D., & van der Veer, G. C. (2012). Human-computer interaction: Introduction and overview. Künstliche Intelligenz, 26(2), 121–126.
Zariffa, J., & Popovic, M. R. (2013). Hand contour detection in wearable camera video using an adaptive histogram region of interest. J NeuroEng Rehab, 10,1(2013-12-19), 10(1), 114–114.
Rogez, G., Supancic, J. S., & Ramanan, D. (2015). Understanding everyday hands in action from RGB-D images. IEEE International Conference on Computer Vision, 22, 3889–3897 IEEE Computer Society.
Mittal, A., Zisserman, A., & Torr, P. (2011). Hand detection using multiple proposals. British Machine Vision Conference, 40, 75.1–75.11.
Li, C., & Kitani, K. M. (2013). Pixel-level hand detection in ego-centric videos. Computer Vision and Pattern Recognition, 9, 3570–3577 IEEE.
Fathi, A., & Rehg, J. M. (2011). Learning to recognize objects in egocentric activities. IEEE Conference on Computer Vision and Pattern Recognition 42, pp.3281-3288). IEEE Computer Society.
Serra, G., Camurri, M., Baraldi, L., Benedetti, M., & Cucchiara, R. (2013). Hand segmentation for gesture recognition in EGO-vision. ACM International Workshop on Interactive Multimedia on Mobile & Portable Devices, 24, 31–36 ACM.
Dalal, N., & Triggs, & Bill. (2005). Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 1, 886–893 IEEE.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2016). Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 142–158.
Girshick, R. (2015). Fast R-CNN. IEEE International Conference on Computer Vision (pp.1440-1448). IEEE.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. International Conference on Neural Information Processing Systems, 39, 91–99 MIT press.
Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks.
Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D., & Twombly, X. (2007). Vision based hand pose estimation: A review. Computer Vision & Image Understanding, 108(1), 52–73.
Wachs, J. P., Kölsch, M., Stern, H., & Edan, Y. (2011). Vision-based hand-gesture applications. Communications of the ACM, 54(2), 60–71.
The Vision for Intelligent Vehicles and Applications (VIVA) Challenge, Laboratory for Intelligent and Safe Automobiles, UCSD. http://cvrr.ucsd.edu/vivachallenge/.
Bambach, S., Lee, S., Crandall, D. J., & Yu, C. (2016). Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. IEEE International Conference on Computer Vision (pp.1949-1957). IEEE.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Jain, V., & Learned-Miller, E. (2010). FDDB: A benchmark for face detection in unconstrained settings. UMass Amherst Technical Report.
Betancourt, A. (2014). A sequential classifier for hand detection in the framework of egocentric vision. Computer Vision and Pattern Recognition Workshops (pp.600-605). IEEE.
Wang, Q., & Zhang, G. (2017). Ore image edge detection using hog-index dictionary learning approach. Journal of Engineering, 1(1).
Yin, H., & Gai, K. (2015). An empirical study on preprocessing high-dimensional class-imbalanced data for classification. IEEE, International Conference on High PERFORMANCE Computing and Communications (pp.1314-1319). IEEE Computer Society.
Le, T. H. N., Zhu, C., Zheng, Y., Luu, K., & Savvides, M. (2017). Robust hand detection in vehicles. International Conference on Pattern Recognition (pp.573-578). IEEE.
Kong, T., Yao, A., Chen, Y., & Sun, F. (2016). HyperNet: Towards accurate region proposal generation and joint object detection. Computer Vision and Pattern Recognition (pp.845-853). IEEE.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., & Fu, C. Y., et al. (2016). SSD: Single shot MultiBox detector. European Conference on Computer Vision (pp.21-37). Springer international publishing.
Kakumanu, P., Makrogiannis, S., & Bourbakis, N. (2007). A survey of skin-color modeling and detection methods. Pattern Recognition, 40(3), 1106–1122.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Computer Science.
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5mb model size.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., & Weyand, T., et al. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications.
Gai, K., Qiu, M., & Sun, X. (2017). A survey on fintech. Journal of Network & Computer Applications.
Yin, H., Gai, K., & Wang, Z. (2016). A classification algorithm based on ensemble feature selections for imbalanced-class dataset. IEEE, International Conference on Big Data Security on Cloud (pp.245-249). IEEE.
Acknowledgements
This work is supported by Foundation of China Institute of water resources and hydropower research (GE0145B112017).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Q., Zhang, G. & Yu, S. 2D Hand Detection Using Multi-Feature Skin Model Supervised Cascaded CNN. J Sign Process Syst 91, 1105–1113 (2019). https://doi.org/10.1007/s11265-018-1406-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-018-1406-3