COSONet: Compact Second-Order Network for Video Face Recognition

Mao, Yirong; Wang, Ruiping; Shan, Shiguang; Chen, Xilin

doi:10.1007/978-3-030-20893-6_4

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11363))

Included in the following conference series:

Asian Conference on Computer Vision

3271 Accesses
3 Citations

Abstract

In this paper, we study the task of video face recognition. The face images in the video typically cover large variations in expression, lighting, or pose, and also suffer from video-type noises such as motion blur, out-of-focus blur and low resolution. To tackle these two types of challenges, we propose an extensive framework which contains three aspects: neural network design, training data augmentation, and loss function. First, we devise an expressive COmpact Second-Order network (COSONet) to extract features from faces with large variations. The network manages to encode the correlation (e.g. sample covariance matrix) of local features in a spatial invariant way, which is useful to model the global texture and appearance of face images. To further handle the curse of high-dimensional problem in the sample covariance matrix, we apply a layer named 2D fully connected (2D-FC) layer with few parameters to reduce the dimension. Second, due to no video-type noises in still face datasets and small inter-frame variation in video face datasets, we augment a large dataset with both large face variations and video-type noises from existing still face dataset. Finally, to get a discriminative face descriptor while balancing the effect of images with various quality, a mixture loss function which encourages the discriminability and simultaneously regularizes the feature is elaborately designed. Detailed experiments show that the proposed framework can achieve very competitive accuracy over state-of-the-art approaches on IJB-A and PaSC datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The source code is available at http://vipl.ict.ac.cn/resources/codes.

References

Bansal, A., Castillo, C., Ranjan, R., Chellappa, R.: The do’s and don’ts for CNN-Based face verification. In: ICCV Workshop, pp. 2545–2554 (2017)
Google Scholar
Beveridge, J.R., Phillips, P.J., Bolme, D.S., Draper, B.A.: The challenge of face recognition from digital point-and-shoot cameras. In: ICB, pp. 1–8 (2013)
Google Scholar
Cao, K., Rong, Y., Li, C., Tang, X., Loy, C.C.: Pose-robust face recognition via deep residual equivariant mapping. In: CVPR, pp. 5187–5196 (2018)
Google Scholar
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. arXiv:1710.08092 (2017)
Cevikalp, H., Triggs, B.: Face recognition based on image sets. In: CVPR, pp. 2567–2573 (2010)
Google Scholar
Chen, S., Sanderson, C., Harandi, M.T., Lovell, B.C.: Improved image set classification via joint sparse approximated nearest subspaces. In: CVPR, pp. 452–459 (2013)
Google Scholar
Chowdhury, A.R., Lin, T.Y., Maji, S., Learnedmiller, E.: One-to-many face recognition with bilinear CNNs. In: WACV, pp. 1–9 (2016)
Google Scholar
Crosswhite, N., Byrne, J., Stauffer, C., Parkhi, O., Cao, Q., Zisserman, A.: Template adaptation for face verification and identification. In: FG, pp. 1–8 (2017)
Google Scholar
Ding, C., Tao, D.: Trunk-branch ensemble convolutional neural networks for video-based face recognition. In: IEEE TPAMI, pp. 1002–1014 (2018)
Article Google Scholar
Dong, Z., Jia, S., Zhang, C., Pei, M., Wu, Y.: Deep manifold learning of symmetric positive definite matrices with application to face recognition. In: AAAI, pp. 4009–4015 (2018)
Google Scholar
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: CVPR, pp. 317–326 (2016)
Google Scholar
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 87–102. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_6
Chapter Google Scholar
Hamm, J., Lee, D.D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: ICML, pp. 376–383 (2008)
Google Scholar
Huang, Z., Gool, L.V.: A riemannian network for SPD matrix learning. In: AAAI, pp. 2036–2042 (2017)
Google Scholar
Huang, Z., Wang, R., Shan, S., Chen, X.: Projection metric learning on grassmann manifold with application to video based face recognition. In: CVPR, pp. 140–149 (2015)
Google Scholar
Huang, Z., Wang, R., Shan, S., Gool, L.V., Chen, X.: Cross euclidean-to-riemannian metric learning with application to face recognition from video. In: IEEE TPAMI (2018). https://doi.org/10.1109/TPAMI.2017.2776154
Article Google Scholar
Huang, Z., Wu, J., Gool, L.V.: Building deep networks on grassmann manifolds. In: AAAI, pp. 3279–3286 (2018)
Google Scholar
Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers. In: ICCV, pp. 2965–2973 (2015)
Google Scholar
Klare, B.F., et al.: Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A. In: CVPR, pp. 1931–1939 (2015)
Google Scholar
Li, P., Xie, J., Wang, Q., Gao, Z.: Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: CVPR, pp. 947–955 (2018)
Google Scholar
Li, P., Xie, J., Wang, Q., Zuo, W.: Is second-order information helpful for large-scale visual recognition? In: ICCV, pp. 2089–2097 (2017)
Google Scholar
Lin, T.Y., Maji, S.: Improved bilinear pooling with CNNs. In: BMVC (2017). CoRRabs/1707.06772
Google Scholar
Lin, T.Y., Roychowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, pp. 1449–1457 (2015)
Google Scholar
Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: CVPR, pp. 4694–4703 (2017)
Google Scholar
Lu, J., Wang, G., Moulin, P.: Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. In: ICCV, pp. 329–336 (2013)
Google Scholar
Masi, I., Trán, A.T., Hassner, T., Leksut, J.T., Medioni, G.: Do we really need to collect millions of faces for effective face recognition? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 579–596. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_35
Chapter Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC, pp. 1–12 (2015)
Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS Workshop (2017)
Google Scholar
Ranjan, R., Castillo, C.D., Chellappa, R.: L2-constrained softmax loss for discriminative face verification. arXiv:1703.09507 (2017)
Rao, Y., Lin, J., Lu, J., Zhou, J.: Learning discriminative aggregation network for video-based face recognition. In: ICCV, pp. 3801–3810 (2017)
Google Scholar
Rao, Y., Lu, J., Zhou, J.: Attention-aware deep reinforcement learning for video face recognition. In: ICCV, pp. 3951–3960 (2017)
Google Scholar
Sohn, K., Liu, S., Zhong, G., Yu, X., Yang, M.H., Chandraker, M.: Unsupervised domain adaptation for face recognition in unlabeled videos. In: ICCV, pp. 5917–5925 (2017)
Google Scholar
Tae-Kyun, K., Josef, K., Roberto, C.: Discriminative learning and recognition of image set classes using canonical correlations. In: IEEE TPAMI, pp. 1005–1018 (2007)
Google Scholar
Tuzel, O., Porikli, F., Meer, P.: Region covariance: a fast descriptor for detection and classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 589–600. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_45
Chapter Google Scholar
Wang, H., et al.: CosFace: large margin cosine loss for deep face recognition. In: CVPR, pp. 5265–5274 (2018)
Google Scholar
Wang, R., Chen, X.: Manifold discriminant analysis. In: CVPR, pp. 429–436 (2009)
Google Scholar
Wang, W., Wang, R., Shan, S., Chen, X.: Discriminative covariance oriented representation learning for face recognition with image sets. In: CVPR, pp. 5749–5758 (2017)
Google Scholar
Yang, J., Zhang, D., Frangi, A.F., Yang, J.Y.: Two-dimensional PCA: a new approach to appearance-based face representation and recognition. In: IEEE TPAMI, pp. 131–137 (2004)
Google Scholar
Yang, J., Ren, P., Chen, D., Wen, F., Li, H., Hua, G.: Neural aggregation network for video face recognition. In: CVPR, pp. 5216–5225 (2017)
Google Scholar
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv:1411.7923 (2014)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. In: IEEE SPL, pp. 1499–1503 (2016)
Article Google Scholar
Zheng, Y., Pal, D.K., Savvides, M.: Ring loss: convex feature normalization for face recognition. In: CVPR, pp. 5089–5097 (2018)
Google Scholar

Download references

Acknowledgements

This work is partially supported by Natural Science Foundation of China under contracts Nos. 61390511, 61772500, 973 Program under contract No. 2015CB351802, Frontier Science Key Research Project CAS No. QYZDJ-SSW-JSC009, and Youth Innovation Promotion Association CAS No. 2015085.

Author information

Authors and Affiliations

Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China
Yirong Mao, Ruiping Wang, Shiguang Shan & Xilin Chen
University of Chinese Academy of Sciences, Beijing, 100049, China
Yirong Mao, Ruiping Wang, Shiguang Shan & Xilin Chen

Authors

Yirong Mao
View author publications
You can also search for this author in PubMed Google Scholar
Ruiping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Shan
View author publications
You can also search for this author in PubMed Google Scholar
Xilin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruiping Wang .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C. V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 241 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mao, Y., Wang, R., Shan, S., Chen, X. (2019). COSONet: Compact Second-Order Network for Video Face Recognition. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11363. Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-20893-6_4
Published: 29 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20892-9
Online ISBN: 978-3-030-20893-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics