Skip to main content
Log in

Learning Saliency Features for Face Detection and Recognition Using Multi-task Network

  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

In this work, we have proposed a method to learn a type of saliency features, which merely makes response in face regions. Based on the saliency features, a joint pipeline is designed to detect and recognize faces as a part of human–robot interaction (HRI) system of SRU robot. The characteristics of the architecture can be described as follows: (i) In the network, detectors can only be activated by face regions. By convoluting the input image, the detectors can produce a group of saliency feature maps, which indicate the location of faces. (ii) The face representations are achieved by pooling on these high response regions. They enjoy discriminative ability to face identification. Hence, classification and detection can be blended using a single network. (iii) To enhance the saliency of features, false responses are suppressed by introducing a saliency term in loss function, which forces the feature detector to ignore non-face inputs. It also can be seen as a branch of multi-task network to learn background. By restricting false responses, the performance of face verification can be improved, especially when the training and testing are implemented on different dataset. In experiments, the effects of saliency term on face verification and benchmark discriminative ability of saliency features on LFW are analyzed. And the effectiveness of this method in face detection is verified by the experimental results on FDDB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Ahonen T, Member S, Hadid A, Pietikainen M, Member S (2006) Face description with local binary patterns: Application to face recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 2037–2041

  2. Benezeth Y, Emile B, Laurent H, Rosenberger C (2010) Vision-based system for human detection and tracking in indoor environment. Int J Soc Robot 2(1):41–52

    Article  Google Scholar 

  3. Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153

    Google Scholar 

  4. Berg T, Belhumeur PN (2012) Tom-vs-pete classifiers and identity-preserving alignment for face verification. In: BMVC, Citeseer, vol. 2, p 7

  5. Chen D, Cao X, Wang L, Wen F, Sun J (2012) Bayesian face revisited: a joint formulation. In: ECCV 2012, Springer, pp 566–579

  6. Chen D, Cao X, Wen F, Sun J (2013) Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2013, pp 3025 – 3032

  7. Yi D, Lei Z, Liao S, Li SZ (2014) Learning face representation from scratch. eprint arXiv:1411.7923

  8. Hadsell R, Chopra S, Lecun Y (2006) Dimensionality reduction by learning an invariant mapping. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2006, pp 1735–1742

  9. He H, Ge SS, Zhang Z (2011) Visual attention prediction using saliency determination of scene understanding for social robots. Int J Soc Robot 3(4):457–468

    Article  MathSciNet  Google Scholar 

  10. He W, Chen Y, Yin Z (2015a) Adaptive neural network control of an uncertain robot with full-state constraints. IEEE Trans Cybern, in press

  11. He W, Ge SS, Li Y, Chew E, Ng YS (2015b) Neural network control of a rehabilitation robot by state and output feedback. J Intell Robot Syst 80(1):15–31

    Article  Google Scholar 

  12. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science (New York, NY) 313(5786):504–547

    Article  MathSciNet  MATH  Google Scholar 

  13. Huang C, Zhu S, Yu K (2012) Large scale strongly supervised ensemble metric learning, with applications to face verification and retrieval. arXiv preprint arXiv:1212.6094

  14. Huang GB, Learned-Miller E (2014) Labeled faces in the wild: Updates and new reporting procedures. Dept Comput Sci, Univ Massachusetts Amherst, Amherst, MA, USA, Technical Report pp 14–003

  15. Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst

  16. Jain V, Learned-Miller E (2010) Fddb: A benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst

  17. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE conference on computer vision and pattern recognition (CVPR) 2006, IEEE, 2, pp 2169–2178

  18. Lin D, Lu C, Liao R, Jia J (2014a) Learning important spatial pooling regions for scene classification. In: IEEE conference on computer vision and pattern recognition (CVPR) 2014, pp 3726–3733

  19. Lin M, Chen Q, Yan S (2014b) Network in network. In: International conference on learning representations (ICLR) 2014

  20. Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model. IEEE Trans Image Process 11:467–476

    Article  Google Scholar 

  21. Liu Z, Luo P, Wang X, Tang X (2014) Deep learning face attributes in the wild. Eprint Arxiv

  22. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comp Vision 60(2):91–110

    Article  Google Scholar 

  23. Mozos OM, Kurazume R, Hasegawa T (2010) Multi-part people detection using 2d range data. Int J Soc Robot 2(1):31–40

    Article  Google Scholar 

  24. Simonyan K, Parkhi O, Vedaldi A, Zisserman A, Simonyan K, Parkhi O, Vedaldi A, Zisserman A (2013) Fisher vector faces in the wild. In Proceedings of the BMVC pp 8.1–8.11

  25. Sun Y, Wang X, Tang X (2013a) Deep convolutional network cascade for facial point detection. In: IEEE conference on computer vision and pattern recognition (CVPR) 2013, pp 3476–3483

  26. Sun Y, Wang X, Tang X (2013b) Hybrid deep learning for face verification. In: IEEE international conference on computer vision (ICCV) 2013, pp 1489–1496

  27. Sun Y, Wang X, Tang X (2014a) Deep learning face representation by joint identification-verification. Proceedings of neural information processing systems conference (NIPS) 2014

  28. Sun Y, Wang X, Tang X (2014b) Deep learning face representation from predicting 10,000 classes. In: IEEE conference on computer vision and pattern recognition (CVPR) 2014, pp 1891–1898

  29. Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: IEEE conference on computer vision and pattern recognition (CVPR) 2014, pp 1701–1708

  30. Yi Sun XT Xiaogang Wang (2014) Deeply learned face representations are sparse, selective, and robust. In: Proceedings of neural information processing systems conference (NIPS) 2014

  31. Yi Sun XWXT Ding Liang (2015) DeepID3: Face recognition with very deep neural networks. In: Proceedings of neural information processing systems conference (NIPS) 2014

  32. Z Zhang, P Luo, Chen CL, Tang X (2014) Facial landmark detection by deep multi-task learning. Springer International Publishing, New York

    Google Scholar 

Download references

Acknowledgments

This work was supported by the National Basic Research Program of China (973 Program) under Grant 2014CB744206 and the Fundamental Research Funds for the China Central Universities of UESTC under Grant ZYGX2013Z003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qian Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Q., Ge, S.S., Ye, M. et al. Learning Saliency Features for Face Detection and Recognition Using Multi-task Network. Int J of Soc Robotics 8, 709–720 (2016). https://doi.org/10.1007/s12369-016-0347-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-016-0347-x

Keywords

Navigation