Abstract
In this work, we have proposed a method to learn a type of saliency features, which merely makes response in face regions. Based on the saliency features, a joint pipeline is designed to detect and recognize faces as a part of human–robot interaction (HRI) system of SRU robot. The characteristics of the architecture can be described as follows: (i) In the network, detectors can only be activated by face regions. By convoluting the input image, the detectors can produce a group of saliency feature maps, which indicate the location of faces. (ii) The face representations are achieved by pooling on these high response regions. They enjoy discriminative ability to face identification. Hence, classification and detection can be blended using a single network. (iii) To enhance the saliency of features, false responses are suppressed by introducing a saliency term in loss function, which forces the feature detector to ignore non-face inputs. It also can be seen as a branch of multi-task network to learn background. By restricting false responses, the performance of face verification can be improved, especially when the training and testing are implemented on different dataset. In experiments, the effects of saliency term on face verification and benchmark discriminative ability of saliency features on LFW are analyzed. And the effectiveness of this method in face detection is verified by the experimental results on FDDB.
Similar content being viewed by others
References
Ahonen T, Member S, Hadid A, Pietikainen M, Member S (2006) Face description with local binary patterns: Application to face recognition. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 2037–2041
Benezeth Y, Emile B, Laurent H, Rosenberger C (2010) Vision-based system for human detection and tracking in indoor environment. Int J Soc Robot 2(1):41–52
Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153
Berg T, Belhumeur PN (2012) Tom-vs-pete classifiers and identity-preserving alignment for face verification. In: BMVC, Citeseer, vol. 2, p 7
Chen D, Cao X, Wang L, Wen F, Sun J (2012) Bayesian face revisited: a joint formulation. In: ECCV 2012, Springer, pp 566–579
Chen D, Cao X, Wen F, Sun J (2013) Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2013, pp 3025 – 3032
Yi D, Lei Z, Liao S, Li SZ (2014) Learning face representation from scratch. eprint arXiv:1411.7923
Hadsell R, Chopra S, Lecun Y (2006) Dimensionality reduction by learning an invariant mapping. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2006, pp 1735–1742
He H, Ge SS, Zhang Z (2011) Visual attention prediction using saliency determination of scene understanding for social robots. Int J Soc Robot 3(4):457–468
He W, Chen Y, Yin Z (2015a) Adaptive neural network control of an uncertain robot with full-state constraints. IEEE Trans Cybern, in press
He W, Ge SS, Li Y, Chew E, Ng YS (2015b) Neural network control of a rehabilitation robot by state and output feedback. J Intell Robot Syst 80(1):15–31
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science (New York, NY) 313(5786):504–547
Huang C, Zhu S, Yu K (2012) Large scale strongly supervised ensemble metric learning, with applications to face verification and retrieval. arXiv preprint arXiv:1212.6094
Huang GB, Learned-Miller E (2014) Labeled faces in the wild: Updates and new reporting procedures. Dept Comput Sci, Univ Massachusetts Amherst, Amherst, MA, USA, Technical Report pp 14–003
Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst
Jain V, Learned-Miller E (2010) Fddb: A benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE conference on computer vision and pattern recognition (CVPR) 2006, IEEE, 2, pp 2169–2178
Lin D, Lu C, Liao R, Jia J (2014a) Learning important spatial pooling regions for scene classification. In: IEEE conference on computer vision and pattern recognition (CVPR) 2014, pp 3726–3733
Lin M, Chen Q, Yan S (2014b) Network in network. In: International conference on learning representations (ICLR) 2014
Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model. IEEE Trans Image Process 11:467–476
Liu Z, Luo P, Wang X, Tang X (2014) Deep learning face attributes in the wild. Eprint Arxiv
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comp Vision 60(2):91–110
Mozos OM, Kurazume R, Hasegawa T (2010) Multi-part people detection using 2d range data. Int J Soc Robot 2(1):31–40
Simonyan K, Parkhi O, Vedaldi A, Zisserman A, Simonyan K, Parkhi O, Vedaldi A, Zisserman A (2013) Fisher vector faces in the wild. In Proceedings of the BMVC pp 8.1–8.11
Sun Y, Wang X, Tang X (2013a) Deep convolutional network cascade for facial point detection. In: IEEE conference on computer vision and pattern recognition (CVPR) 2013, pp 3476–3483
Sun Y, Wang X, Tang X (2013b) Hybrid deep learning for face verification. In: IEEE international conference on computer vision (ICCV) 2013, pp 1489–1496
Sun Y, Wang X, Tang X (2014a) Deep learning face representation by joint identification-verification. Proceedings of neural information processing systems conference (NIPS) 2014
Sun Y, Wang X, Tang X (2014b) Deep learning face representation from predicting 10,000 classes. In: IEEE conference on computer vision and pattern recognition (CVPR) 2014, pp 1891–1898
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: IEEE conference on computer vision and pattern recognition (CVPR) 2014, pp 1701–1708
Yi Sun XT Xiaogang Wang (2014) Deeply learned face representations are sparse, selective, and robust. In: Proceedings of neural information processing systems conference (NIPS) 2014
Yi Sun XWXT Ding Liang (2015) DeepID3: Face recognition with very deep neural networks. In: Proceedings of neural information processing systems conference (NIPS) 2014
Z Zhang, P Luo, Chen CL, Tang X (2014) Facial landmark detection by deep multi-task learning. Springer International Publishing, New York
Acknowledgments
This work was supported by the National Basic Research Program of China (973 Program) under Grant 2014CB744206 and the Fundamental Research Funds for the China Central Universities of UESTC under Grant ZYGX2013Z003.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, Q., Ge, S.S., Ye, M. et al. Learning Saliency Features for Face Detection and Recognition Using Multi-task Network. Int J of Soc Robotics 8, 709–720 (2016). https://doi.org/10.1007/s12369-016-0347-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12369-016-0347-x