Abstract
In video-sharing websites and surveillance scenarios, there are often a large amount of face videos. This paper proposes a joint dictionary learning and subspace segmentation method for video-based face recognition (VFR). We assume that the face images from one subject video lie in a union of multiple linear subspaces, and there exists a global dictionary to represent these images and segment them to their corresponding subspaces. This assumption results in a “chicken and egg” problem, where subspace clustering and dictionary learning are mutually dependent. To solve thiss problem, we propose a joint optimization model that includes three parts. The first part seeks a low-rank representation for subspace segmentation; the second part encourages the dictionary to accurately represent the data while tolerating frame-wise corruption or outliers; and the third part is a regularization on the dictionary. An alternating minimization method is employed as an efficient solution to the proposed joint formulation. In each iteration, it alternately learns the subspace structure and the dictionary by improving the learning results. Experiments on three video-based face databases show that our approach consistently outperforms the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Here and for the rest of the paper, a variable with a superscript * denotes the optimal solution. One should not confuse the notation with the symbol of Hermitian transpose.
References
Hu, Y., Mian, A., Owens, R.: Sparse approximated nearest points for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2011)
Wang, R., Guo, H., Davis, L., Dai, Q.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2012)
Cui, Z., Zhang, H., Lao, S., Chen, X.: Image sets alignment for video-based face recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2012)
Chen, Y.C., Patel, V., Phillips, P., Chellappa, R.: Dictionary-based face recognition from video. In: Proceedings of European Conference of Computer Vision (2012)
Chen, Y.C., Patel, V., Shekhar, S., Chellappa, R., Phillips, P.: Video-based face recognition via joint sparse representation. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition (2013)
Yang, M., Zhu, P., Zhang, L.: Face recognition based on regularized points between image sets. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition (2013)
Ortiz, E., Wright, A., Shah, M.: Face recognition in movie trailers via mean sequence spars representation-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2013)
Shakhnarovich, G., Fisher, J., Darrell, T.: Face recognition from long-term observations. In: Proceedings of European Conference on Computer Vision (2002)
Satoh, S.: Conparative evaluation on face sequence matching for content-based video access. In: Proceedings of IEEE Automatic Face and Gesture Recognition (2000)
Krüger, V., Zhou, S.: Exemplar-based face recognition from video. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 732–746. Springer, Heidelberg (2002)
Kim, M., Kumar, S., Pavlovic, V., Rowley, H.: Face tracking and recognition with visual constraints in real-world videos. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2008)
Lee, K., Ho, J., Yang, M., Kriegman, D.: Visual tracking and recognition using probabilistic appearance manifolds. In: Proceedings of Computer Vision and Image Understanding (2005)
Cevikalp, H., Triggs, B.: Face recognition based on image sets. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2010)
Kim, T., Arandjelovic, O., Cipolla, R.: Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1005–1018 (2007)
Wang, R., Shan, S., Chen, X., Gao, W.: Manifold-manifold distance with application to face recognition based on image set. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2008)
Wang, R., Chen, X.: Manifold discrimininant analysis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2009)
Chen, S., Sanderson, C., Harandi, M.T., Lovell, B.: Improved image set classification via joint sparse approximated nearest subspaces. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2013)
Huang, Z., Shan, S., Wang, R., Chen, X.: Coupling alignments with recognition for still-to-video face recognition. In: IEEE International Conference on Computer Vision (2013)
Lu, J., Wang, G., Moulin, P.: Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. In: IEEE International Conference on Computer Vision (2013)
Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 210–227 (2009)
Liu, G., Lin, Z., Yu, Y.: Robust subspace segmentation by low-rank representation. In: International Conference on Machine Learning (2010)
Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2765–2781 (2013)
Favaro, P., Vidal, R., Ravichandran, A.: A closed form solution to robust subspace estimation and clustering. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2011)
He, R., Sun, Z., Tan, T., Zheng, W.S.: Recovery of corrupted low-rank matrices via half-quadratic based non convex minimization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2011)
Aharon, M., Elad, M., Bruckstein, A.: K-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54, 4311–4322 (2006)
Gross, R., Shi, J.: The cmu motion of body (mobo) database. Technical Report CMU-RI-TR-01-18, Robotics Institute, Pittsburgh, PA (2001)
Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vision 57, 137–154 (2004)
Acknowledgement
This work was supported by the Army Research Office MURI Grant W\(911\)NF-\(09\)-\(1\)-\(0383\). We also thank Dr. Ruiping Wang for sharing the processed data.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, G., He, R., Davis, L.S. (2015). Jointly Learning Dictionaries and Subspace Structure for Video-Based Face Recognition. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9005. Springer, Cham. https://doi.org/10.1007/978-3-319-16811-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-16811-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16810-4
Online ISBN: 978-3-319-16811-1
eBook Packages: Computer ScienceComputer Science (R0)