Abstract
Face video retrieval has drawn considerable research attention recently. Most prior research mainly focused on either appearance features or correlation features, which could degrade retrieval performance. In this paper, we fuse appearance features and correlation features to exploit rich information of face videos for face video retrieval via a deep convolutional neural network. The network extracts appearance feature and correlation feature from a frame and the covariance matrix of a face video, respectively, and fuses them to obtain a comprehensive video representation. The fused feature is projected to a low-dimensional Hamming space via hash functions for the retrieval task. The network integrates feature extractions, feature fusion, and hash learning into a unified optimization framework to guarantee optimal compatibility of appearance features and correlation features. Experiments on two challenging TV-Series datasets demonstrate the effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arandjelovic, O., Zisserman, A.: Automatic face recognition for film character retrieval in feature-length films. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 860–867. IEEE (2005)
Arandjelović, O., Zisserman, A.: On film character retrieval in feature-length films. In: Interactive Video, pp. 89–105 (2006)
Cevikalp, H., Triggs, B.: Face recognition based on image sets. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2567–2573. IEEE (2010)
Conjeti, S., Paschali, M., Katouzian, A., Navab, N.: Learning robust hash codes for multiple instance image retrieval. arXiv preprint arXiv:1703.05724 (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 248–255. IEEE (2009)
Dong, Z., Jia, S., Wu, T., Pei, M.: Face video retrieval via deep learning of binary hash representations. In: AAAI, pp. 3471–3477 (2016)
Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. In: VLDB, vol. 99, pp. 518–529 (1999)
Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: CVPR, pp. 817–824. IEEE (2011)
Hoang, T., Do, T.T., Tan, D.K.L., Cheung, N.M.: Enhance feature discrimination for unsupervised hashing. arXiv preprint arXiv:1704.01754 (2017)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Kim, T.K., Kittler, J., Cipolla, R.: Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1005–1018 (2007)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012)
Li, X., Lin, G., Shen, C., Van Den Hengel, A., Dick, A.R.: Learning hash functions using column generation. In: ICML, vol. 1, pp. 142–150 (2013)
Li, Y., Wang, R., Cui, Z., Shan, S., Chen, X.: Compact video code and its application to robust face retrieval in TV-Series. In: BMVC (2014)
Li, Y., Wang, R., Cui, Z., Shan, S., Chen, X.: Spatial pyramid covariance-based compact video code for robust face retrieval in TV-Series. IEEE Trans. Image Process. 25(12), 5905–5919 (2016)
Li, Y., Wang, R., Shan, S., Chen, X.: Hierarchical hybrid statistic based video binary code and its application to face retrieval in TV-Series. In: FG, pp. 1–8. IEEE (2015)
Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: CVPR, pp. 2074–2081. IEEE (2012)
Parkhi, O.M., Simonyan, K., Vedaldi, A., Zisserman, A.: A compact and discriminative face track descriptor. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1693–1700 (2014)
Qiao, S., Wang, R., Shan, S., Chen, X.: Deep video code for efficient face video retrieval (2016)
Rastegari, M., Farhadi, A., Forsyth, D.: Attribute discovery via predictable discriminative binary codes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 876–889. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_63
Sivic, J., Everingham, M., Zisserman, A.: Person spotting: video shot retrieval for face sets. In: Leow, W.-K., Lew, M.S., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 226–236. Springer, Heidelberg (2005). https://doi.org/10.1007/11526346_26
Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: CVPR, pp. 3424–3431. IEEE (2010)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances in Neural Information Processing Systems, pp. 1753–1760 (2009)
Wu, S., Chen, Y.C., Li, X., Wu, A.C., You, J.J., Zheng, W.S.: An enhanced deep feature representation for person re-identification. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016)
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv preprint arXiv:1411.7923 (2014)
Zhu, F., Kong, X., Zheng, L., Fu, H., Tian, Q.: Part-based deep hashing for large-scale person re-identification. IEEE Trans. Image Process. 26(10), 4806–4817 (2017)
Acknowledgments
This work was supported in part by the Natural Science Foundation of China (NSFC) under Grant No. 61472038 and No. 61375044.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Jing, C., Dong, Z., Pei, M., Jia, Y. (2018). Fusing Appearance Features and Correlation Features for Face Video Retrieval. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10736. Springer, Cham. https://doi.org/10.1007/978-3-319-77383-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-77383-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77382-7
Online ISBN: 978-3-319-77383-4
eBook Packages: Computer ScienceComputer Science (R0)