Abstract
Face recognition from image sets has numerous real-life applications including recognition from security and surveillance systems, multi-view camera networks and personal albums. An image set is an unordered collection of images (e.g., video frames, images acquired over long term observations and personal albums) which exhibits a wide range of appearance variations. The main focus of the previously developed methods has therefore been to find a suitable representation to optimally model these variations. This paper argues that such a representation could not necessarily encode all of the information contained in the set. The paper, therefore, suggests a different approach which does not resort to a single representation of an image set. Instead, the images of the set are retained in their original form and an efficient classification strategy is developed which extends well-known simple binary classifiers for the task of multi-class image set classification. Unlike existing binary to multi-class extension strategies, which require multiple binary classifiers to be trained over a large number of images, the proposed approach is efficient since it trains only few binary classifiers on very few images. Extensive experiments and comparisons with existing methods show that the proposed approach achieves state of the art performance for image set classification based face and object recognition on a number of challenging datasets.
Similar content being viewed by others
References
An, S., Hayat, M., Khan, S. H., Bennamoun, M., Boussaid, F., & Sohel, F. (2015). Contractive rectifier networks for nonlinear maximum margin classification. In Proceedings of the IEEE international conference on computer vision (pp. 2515–2523)
Arandjelovic, O., Shakhnarovich, G., Fisher, J., Cipolla, R., & Darrell, T. (2005). Face recognition with image sets using manifold density divergence. In 2005 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 581–588)
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Cevikalp, H., & Triggs, B. (2010). Face recognition based on image sets. In IEEE conference on computer vision and pattern recognition, 2010. CVPR 2010 (pp. 2567–2573). IEEE.
Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
Chatfield, K., Simonyan, K., Vedaldi, A. & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In BMVC.
Chien, J. T., & Wu, C. C. (2002). Discriminant waveletfaces and nearest feature classifiers for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1644–1649.
Davis, J. V., Kulis, B., Jain, P., Sra, S. & Dhillon, I. S. (2007). Information-theoretic metric learning. In Proceedings of the 24th international conference on machine learning (pp. 209–216). ACM.
Eth80. http://www.d2.mpi-inf.mpg.de/Datasets/ETH80. Accessed 05 July 2014.
Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.
Fanelli, G., Gall, J., & Van Gool, L. (2011a). Real time head pose estimation with random regression forests. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) pp. 617–624. IEEE.
Fanelli, G., Weise, T., Gall, J., & Van Gool, L. (2011b). Real time head pose estimation from consumer depth cameras. Pattern Recognition, 6835, 101–110.
Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2004). Neighbourhood components analysis. In Advances in neural information processing systems, (p. 17).
Gross, R., & Shi, J. (2001). The cmu motion of body (mobo) database. Technical report.
Harandi, M. T., Sanderson, C., Shirazi, S., & Lovell, B. C. (2011). Graph embedding discriminant analysis on grassmannian manifolds for improved image set matching. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2705–2712).
Hayat, M., & Bennamoun, M. (2014). An automatic framework for textured 3d video-based facial expression recognition. IEEE Transactions on Affective Computing, 5(3), 301–313.
Hayat, M., Bennamoun, M. & An, S. (2014). Learning non-linear reconstruction models for image set classification. In 2014 IEEE conference on computer vision and pattern recognition (CVPR).
Hayat, M., Bennamoun, M. & An, S. (2014). Reverse training: An efficient approach for image set classification. In: D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (eds.) Computer Vision ECCV 2014, Lecture Notes in Computer Science, vol. 8694, pp. 784–799. Springer International Publishing.
Hayat, M., Bennamoun, M., & An, S. (2015). Deep reconstruction models for image set classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4), 713–727.
Hayat, M., Bennamoun, M. & El-Sallam, A. A. (2013). Clustering of video-patches on grassmannian manifold for facial expression recognition from 3d videos. In 2013 IEEE workshop on applications of computer vision (WACV).
Hu, Y., Mian, A. S., & Owens, R. (2012). Face recognition using sparse approximated nearest points between image sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10), 1992–2004.
Huang, Z., Shan, S., Zhang, H., Lao, S., Kuerban, A. & Chen, X. (2013). Benchmarking still-to-video face recognition via partial and local linear discriminant analysis on COX-S2V dataset. In Computer Vision–ACCV 2012 (pp. 589–600). Springer.
Huang, Z., Wang, R., Shan, S. & Chen, X. (2014). Learning euclidean-to-riemannian metric for point-to-set classification.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S. & Darrell, T. (2014), Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.
Khan, S. H., Bennamoun, M., Sohel, F. & Togneri, R. (2014). Automatic feature learning for robust shadow detection. In IEEE 27th international conference on computer vision and pattern recognition (CVPR) (pp. 1939–1946). IEEE.
Khan, S. H., Hayat, M., Bennamoun, M., Togneri, R., & Sohel, F. A. (2016). A discriminative representation of convolutional features for indoor scene recognition. IEEE Transactions on Image Processing, 25(7), 3372–3383.
Kim, M., Kumar, S., Pavlovic, V. & Rowley, H. (2008). Face tracking and recognition with visual constraints in real-world videos. In 2008 IEEE conference on computer vision and pattern recognition (CVPR), (pp. 1–8). IEEE.
Kim, T. K., Kittler, J., & Cipolla, R. (2007). Discriminative learning and recognition of image set classes using canonical correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 1005–1018.
Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS (pp. 1097–1105).
Kumar, N., Berg, A.C., Belhumeur, P. N., & Nayar, S. K. (2009). Attribute and simile classifiers for face verification. In IEEE international conference on computer vision (ICCV).
Lee, K. C., Ho, J., Yang, M. H. & Kriegman, D. (2003). Video-based face recognition using probabilistic appearance manifolds. In 2003 IEEE conference on computer vision and pattern recognition (CVPR), vol. 1, pp. I–313. IEEE.
Leibe, B. & Schiele, B. (2003). Analyzing appearance and contour based methods for object categorization. In 2003 IEEE conference on computer vision and pattern recognition (CVPR) vol. 2, pp. II–409. IEEE.
Li, B. Y., Mian, A. S., Liu, W. & Krishna, A. (2013). Using kinect for face recognition under varying poses, expressions, illumination and disguise. In 2013 IEEE workshop on applications of computer vision (WACV) (pp. 186–192). IEEE.
Lu, J., Wang, G. & Moulin, P. (2013). Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. In 2013 IEEE conference on international conference on computer vision (ICCV)
Ng, H. W. & Winkler, S. (2014). A data-driven approach to cleaning large face datasets. In IEEE international conference on image processing, Paris, France, 27–30 Oct. IEEE.
Oja, E. (1983). Subspace methods of pattern recognition (Vol. 4). Baldock: Research Studies Press England.
Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.
Ortiz, E., Wright, A. & Shah, M. (2013). Face recognition in movie trailers via mean sequence sparse representation-based classification. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3531–3538). doi:10.1109/CVPR.2013.453
Parkhi, O. M., Vedaldi, A. & Zisserman, A.(2015). Deep face recognition. In British machine vision conference.
Ross, D. A., Lim, J., Lin, R. S., & Yang, M. H. (2008). Incremental learning for robust visual tracking. International Journal of Computer Vision, 77(1–3), 125–141.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252. doi:10.1007/s11263-015-0816-y.
Shakhnarovich, G., Fisher, J. W., & Darrell, T. (2002). Face recognition from long-term observations. In European conference on computer vision (ECCV), (pp. 851–865). Springer.
Sharif Razavian, A., Azizpour, H., Sullivan, J. & Carlsson, S. (2014). Cnn features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 806–813).
Sharma, A., Kumar, A., Daume, H. & Jacobs, D. W. (2012). Generalized multiview analysis: A discriminative latent space. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2160–2167). IEEE.
Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. The Journal of Machine Learning Research, 8, 1027–1061.
Uzair, M., Mahmood, A., Mian, A. & McDonald, C. (2013). A compact discriminative representation for efficient image-set classification with application to biometric recognition. In 2013 International conference on biometrics (ICB). IEEE.
Vedaldi, A., & Zisserman, A. (2012). Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 480–492.
Vincent, P. & Bengio, Y. (2001). K-local hyperplane and convex distance nearest neighbor algorithms. In Advances in neural information processing systems (pp. 985–992).
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.
Wang, R. & Chen, X. (2009). Manifold discriminant analysis. In IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009, (pp. 429–436). IEEE.
Wang, R., Guo, H., Davis, L. S. & Dai, Q. (2012). Covariance discriminative learning: A natural and efficient approach to image set classification. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2496–2503). IEEE.
Wang, R., Shan, S., Chen, X. & Gao, W. (2008). Manifold-manifold distance with application to face recognition based on image set. In IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008 (pp. 1–8). IEEE.
Weinberger, K. Q., & Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research, 10, 207–244.
Yamaguchi, O., Fukui, K. & Maeda, K. I. (1998). Face recognition using temporal image sequence. In 1998 IEEE international conference on automatic face and gesture recognition (FG) (pp. 318–323). IEEE.
Yang, M., Zhu, P., Gool, L. V. & Zhang, L. (2013). Face recognition based on regularized nearest points between image sets, pp. 1–7.
Yang, P., Shan, S., Gao, W., Li, S. Z. & Zhang, D. (2004). Face recognition using ada-boosted gabor features. InProceedings on sixth IEEE international conference on automatic face and gesture recognition, 2004 (pp. 356–361). IEEE.
Yin, L., Chen, X., Sun, Y., Worm, T. & Reale, M. (2008). A high-resolution 3d dynamic facial expression database. In 8th IEEE international conference on automatic face gesture recognition, FG ’08 (pp. 1 –6).
Zhu, P., Zhang, L., Zuo, W. & Zhang, D. (2013). From point to set: Extend the learning of distance metrics. In 2013 IEEE conference on international conference on computer vision (ICCV). IEEE.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by K. Kise.
Rights and permissions
About this article
Cite this article
Hayat, M., Khan, S.H. & Bennamoun, M. Empowering Simple Binary Classifiers for Image Set Based Face Recognition. Int J Comput Vis 123, 479–498 (2017). https://doi.org/10.1007/s11263-017-1000-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-017-1000-3