Abstract
In this paper, we study the task of video face recognition. The face images in the video typically cover large variations in expression, lighting, or pose, and also suffer from video-type noises such as motion blur, out-of-focus blur and low resolution. To tackle these two types of challenges, we propose an extensive framework which contains three aspects: neural network design, training data augmentation, and loss function. First, we devise an expressive COmpact Second-Order network (COSONet) to extract features from faces with large variations. The network manages to encode the correlation (e.g. sample covariance matrix) of local features in a spatial invariant way, which is useful to model the global texture and appearance of face images. To further handle the curse of high-dimensional problem in the sample covariance matrix, we apply a layer named 2D fully connected (2D-FC) layer with few parameters to reduce the dimension. Second, due to no video-type noises in still face datasets and small inter-frame variation in video face datasets, we augment a large dataset with both large face variations and video-type noises from existing still face dataset. Finally, to get a discriminative face descriptor while balancing the effect of images with various quality, a mixture loss function which encourages the discriminability and simultaneously regularizes the feature is elaborately designed. Detailed experiments show that the proposed framework can achieve very competitive accuracy over state-of-the-art approaches on IJB-A and PaSC datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The source code is available at http://vipl.ict.ac.cn/resources/codes.
References
Bansal, A., Castillo, C., Ranjan, R., Chellappa, R.: The do’s and don’ts for CNN-Based face verification. In: ICCV Workshop, pp. 2545–2554 (2017)
Beveridge, J.R., Phillips, P.J., Bolme, D.S., Draper, B.A.: The challenge of face recognition from digital point-and-shoot cameras. In: ICB, pp. 1–8 (2013)
Cao, K., Rong, Y., Li, C., Tang, X., Loy, C.C.: Pose-robust face recognition via deep residual equivariant mapping. In: CVPR, pp. 5187–5196 (2018)
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. arXiv:1710.08092 (2017)
Cevikalp, H., Triggs, B.: Face recognition based on image sets. In: CVPR, pp. 2567–2573 (2010)
Chen, S., Sanderson, C., Harandi, M.T., Lovell, B.C.: Improved image set classification via joint sparse approximated nearest subspaces. In: CVPR, pp. 452–459 (2013)
Chowdhury, A.R., Lin, T.Y., Maji, S., Learnedmiller, E.: One-to-many face recognition with bilinear CNNs. In: WACV, pp. 1–9 (2016)
Crosswhite, N., Byrne, J., Stauffer, C., Parkhi, O., Cao, Q., Zisserman, A.: Template adaptation for face verification and identification. In: FG, pp. 1–8 (2017)
Ding, C., Tao, D.: Trunk-branch ensemble convolutional neural networks for video-based face recognition. In: IEEE TPAMI, pp. 1002–1014 (2018)
Dong, Z., Jia, S., Zhang, C., Pei, M., Wu, Y.: Deep manifold learning of symmetric positive definite matrices with application to face recognition. In: AAAI, pp. 4009–4015 (2018)
Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: CVPR, pp. 317–326 (2016)
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 87–102. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_6
Hamm, J., Lee, D.D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: ICML, pp. 376–383 (2008)
Huang, Z., Gool, L.V.: A riemannian network for SPD matrix learning. In: AAAI, pp. 2036–2042 (2017)
Huang, Z., Wang, R., Shan, S., Chen, X.: Projection metric learning on grassmann manifold with application to video based face recognition. In: CVPR, pp. 140–149 (2015)
Huang, Z., Wang, R., Shan, S., Gool, L.V., Chen, X.: Cross euclidean-to-riemannian metric learning with application to face recognition from video. In: IEEE TPAMI (2018). https://doi.org/10.1109/TPAMI.2017.2776154
Huang, Z., Wu, J., Gool, L.V.: Building deep networks on grassmann manifolds. In: AAAI, pp. 3279–3286 (2018)
Ionescu, C., Vantzos, O., Sminchisescu, C.: Matrix backpropagation for deep networks with structured layers. In: ICCV, pp. 2965–2973 (2015)
Klare, B.F., et al.: Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A. In: CVPR, pp. 1931–1939 (2015)
Li, P., Xie, J., Wang, Q., Gao, Z.: Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: CVPR, pp. 947–955 (2018)
Li, P., Xie, J., Wang, Q., Zuo, W.: Is second-order information helpful for large-scale visual recognition? In: ICCV, pp. 2089–2097 (2017)
Lin, T.Y., Maji, S.: Improved bilinear pooling with CNNs. In: BMVC (2017). CoRRabs/1707.06772
Lin, T.Y., Roychowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, pp. 1449–1457 (2015)
Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: CVPR, pp. 4694–4703 (2017)
Lu, J., Wang, G., Moulin, P.: Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. In: ICCV, pp. 329–336 (2013)
Masi, I., Trán, A.T., Hassner, T., Leksut, J.T., Medioni, G.: Do we really need to collect millions of faces for effective face recognition? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 579–596. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_35
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC, pp. 1–12 (2015)
Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS Workshop (2017)
Ranjan, R., Castillo, C.D., Chellappa, R.: L2-constrained softmax loss for discriminative face verification. arXiv:1703.09507 (2017)
Rao, Y., Lin, J., Lu, J., Zhou, J.: Learning discriminative aggregation network for video-based face recognition. In: ICCV, pp. 3801–3810 (2017)
Rao, Y., Lu, J., Zhou, J.: Attention-aware deep reinforcement learning for video face recognition. In: ICCV, pp. 3951–3960 (2017)
Sohn, K., Liu, S., Zhong, G., Yu, X., Yang, M.H., Chandraker, M.: Unsupervised domain adaptation for face recognition in unlabeled videos. In: ICCV, pp. 5917–5925 (2017)
Tae-Kyun, K., Josef, K., Roberto, C.: Discriminative learning and recognition of image set classes using canonical correlations. In: IEEE TPAMI, pp. 1005–1018 (2007)
Tuzel, O., Porikli, F., Meer, P.: Region covariance: a fast descriptor for detection and classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 589–600. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_45
Wang, H., et al.: CosFace: large margin cosine loss for deep face recognition. In: CVPR, pp. 5265–5274 (2018)
Wang, R., Chen, X.: Manifold discriminant analysis. In: CVPR, pp. 429–436 (2009)
Wang, W., Wang, R., Shan, S., Chen, X.: Discriminative covariance oriented representation learning for face recognition with image sets. In: CVPR, pp. 5749–5758 (2017)
Yang, J., Zhang, D., Frangi, A.F., Yang, J.Y.: Two-dimensional PCA: a new approach to appearance-based face representation and recognition. In: IEEE TPAMI, pp. 131–137 (2004)
Yang, J., Ren, P., Chen, D., Wen, F., Li, H., Hua, G.: Neural aggregation network for video face recognition. In: CVPR, pp. 5216–5225 (2017)
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv:1411.7923 (2014)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. In: IEEE SPL, pp. 1499–1503 (2016)
Zheng, Y., Pal, D.K., Savvides, M.: Ring loss: convex feature normalization for face recognition. In: CVPR, pp. 5089–5097 (2018)
Acknowledgements
This work is partially supported by Natural Science Foundation of China under contracts Nos. 61390511, 61772500, 973 Program under contract No. 2015CB351802, Frontier Science Key Research Project CAS No. QYZDJ-SSW-JSC009, and Youth Innovation Promotion Association CAS No. 2015085.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Mao, Y., Wang, R., Shan, S., Chen, X. (2019). COSONet: Compact Second-Order Network for Video Face Recognition. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11363. Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-20893-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20892-9
Online ISBN: 978-3-030-20893-6
eBook Packages: Computer ScienceComputer Science (R0)