Abstract
Face image-video retrieval refers to retrieving videos of a specific person with image query or searching face images of one person by using a video clip query. It has attracted much attention for broad applications like suspect tracking and identifying. This paper proposes a novel implicit relative attribute enabled cross-modality hashing (IRAH) method for large-scale face image-video retrieval. To cope with large-scale data, the proposed IRAH method facilitates fast cross-modality retrieval through embedding two entirely heterogeneous spaces, i.e., face images in Euclidean space and face videos on a Riemannian manifold, into a unified compact Hamming space. In order to resolve the semantic gap, IRAH maps the original low-level kernelized features to discriminative high-level implicit relative attributes. Therefore, the retrieval accuracy can be improved by leveraging both the label information across different modalities and the semantic structure obtained from the implicit relative attributes in each modality. To evaluate the proposed method, we conduct extensive experiments on two publicly available databases, i.e., the Big Bang Theory (BBT) and Buffy the Vampire Slayer (BVS). The experimental results demonstrate the superiority of the proposed method over different state-of-the-art cross-modality hashing methods. The performance gains are especially significant in the case that the hash code length is 8 bits, up to 12% improvements over the second best method among tested methods.
Similar content being viewed by others
References
An L, Zou C, Zhang L et al (2016) Scalable attribute-driven face image retrieval. Neurocomputing 2016(172):215–224
Araujo A, Girod B (2017) Large-Scale Video Retrieval Using Image Queries. IEEE Transactions on Circuits and Systems for Video Technology 2017
Araujo A, Chaves J, Angst R et al (2015) Temporal aggregation for large-scale query-by-image video retrieval. IEEE International Conference on Image Processing (ICIP) 2015:1519–1522
Bauml M, Tapaswi M, Stiefelhagen R (2013) Semi-supervised learning with constraints for person identification in multimedia data. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2013:3602–3609
Cevikalp H, Triggs B (2010) Face recognition based on image sets. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2010:2567–2573
Chakraborty S, Singh S, Chakraborty P (2016) Local Gradient Hexa Pattern: A Descriptor for Face Recognition and Retrieval. IEEE Transactions on Circuits and Systems for Video Technology 2016
Chen BC, Chen YY, Kuo YH, Ngo TD, Le DD, Satoh S, Hsu WH (2016) Scalable face track retrieval in video archives using bag-of-faces sparse representation. IEEE Transactions on Circuits and Systems for Video Technology 2016
Chen BC, Chen YY, Kuo YH et al (2013) Scalable face image retrieval using attribute-enhanced sparse codewords. IEEE Trans Multimedia 15(5):1163–1173
Chen Z, Lu J, Feng J et al (2017) Nonlinear discrete hashing. IEEE Trans Multimedia 19(1):123–135
Cui J, Liu Y et al (2013) Tracking generic human motion via fusion of low-and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst Hum 43(4):996–1002
Dai P, Wang X, Zhang W (2017) Coarse-to-fine multiview 3d face reconstruction using multiple geometrical features. Multimed Tool Appl 2017:1–28
Ding G, Guo Y, Zhou J et al (2016) Large-scale cross-modality Search via Collective Matrix Factorization Hashing. IEEE Trans Image Process 25(11):5427–5440
Ding K, Fan B, Huo C, Xiang S, Pan C (2017) Cross-modal hashing via rank-order preserving. IEEE Trans Multimedia 19(3):571–585
Ding S, Li G, Li Y et al (2016) Survsurf: human retrieval on large surveillance video data. Multimed Tool Appl 2016:1–29
Dong Z, Jia S et al (2016) Face Video Retrieval via Deep Learning of Binary Hash Representations. AAAI
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. VLDB 99(6):518–529
Hu Y, Mian AS, Owens R (2011) Sparse approximated nearest points for image set classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2011:121–128
Jiang Q, Li W (2017) Deep cross-modal hashing. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017:3270–3278
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining 2002, pp 133–142
Kafai M, Eshghi K, Bhanu B (2014) Discrete cosine transform locality-sensitive hashes for face retrieval. IEEE Trans Multimedia 16(4):1090–1103
Kafai M, Eshghi K, Bhanu B (2014) Discrete cosine transform locality-sensitive hashes for face retrieval. IEEE Trans Multimedia 16(4):1090–1103
Korman S, Avidan S (2016) Coherency sensitive hashing. IEEE Trans Pattern Anal Mach Intell 38(6):1099–1112
Li Q, Zhou X et al (2016) Nuclear norm regularized convolutional Max Pos@ top machine. Neural Comput Applic 1–10, https://doi.org/10.1007/s00521-016-2680-2
Li Y, Wang R, Cui Z et al (2016) Spatial pyramid covariance-based compact video code for robust face retrieval in TV-series. IEEE Trans Image Process 25(12):5905–5919
Li Y, Wang R, Huang Z et al (2015) Face video retrieval with image query via hashing across euclidean space and riemannian manifold. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015:4758–4767
Li Y, Zhang Y et al (2017) Large-Scale Remote Sensing Image Retrieval by Deep Hashing Networks. IEEE Trans Geosci Remote Sens 56(2):950–965
Liang R, Shi L et al (2016) Optimizing top precision performance measure of content-based image retrieval by learning similarity function. IEEE International Conference on Pattern Recognition (ICPR)
Lin K, Wang X, Tan Y (2016) Self-adaptive morphable model based collaborative multi-view 3d face reconstruction in visual sensor network. Multimed Tool Appl 75(18):11469–11491
Lin J, Li Z et al (2017) Discriminative Deep Hashing for Scalable Face Image Retrieval. Proceedings of International Joint Conference on Artificial Intelligence
Liu H, Zhao Q, Wang H et al (2016) An image-based near-duplicate video retrieval and localization using improved edit distance. Multimed Tool Appl 2016:1–22
Liu H, Wang R et al (2016) Deep supervised hashing for fast image retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Liu L, Lin Z, Shao L et al (2017) Sequential discrete hashing for scalable cross-modality similarity retrieval. IEEE Trans Image Process 26(1):107–118
Liu L, Shen F et al (2017) Deep sketch hashing: Fast free-hand sketch-based image retrieval. arXiv:1703.05605
Liu M, Zhang L et al (2017) Recognizing semantic correlation in image-text weibo via feature space mapping. Computer Vision and Image Understanding
Liu Y, Zhang L et al (2016) Fortune Teller: Predicting Your Career Path. AAAI
Liu Y, Cui J et al (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: 2012 21st International Conference on Pattern Recognition (ICPR). IEEE
Liu Y, Liang Y et al (2016). arXiv:1610.09462
Liu Y, Zheng Y et al (2016) Urban water quality prediction based on multi-task multi-view learning
Liu X, Mu Y, Zhang D et al (2015) Large-scale unsupervised hashing with shared structure learning. IEEE Trans Cybern 45(9):1811–1822
Liu X, Deng C, Lang B et al (2016) Query-adaptive reciprocal hash tables for nearest neighbor search. IEEE Trans Image Process 25(2):907–919
Lin Z, Ding G, Han J et al (2016) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybern 47(12):4342–4355
Lu J, Wang G, Moulin P (2013) Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. IEEE International Conference on Computer Vision (ICCV) 2013:329–336
Nie L, Hong R et al (2016) Perceptual attributes optimization for multivideo summarization. IEEE Trans Cybern 46(12):2991–3003
Kumar N, Berg A, Belhumeur PN, Nayar S (2011) Describable visual attributes for face verification and image search. IEEE Trans Pattern Anal Mach Intell 33(10):1962–1977
Parikh D, Grauman K (2011) Relative attributes. IEEE International Conference on Computer Vision (ICCV) 2011:503–510
Park U, Jain AK (2010) Face matching and retrieval using soft biometrics. IEEE Trans Inf Forensics Secur 5(3):406–415
Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):2007
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Qiao S, Wang R et al (2016) Deep video code for efficient face video retrieval. Asian conference on computer vision. Springer, Berlin
Qin J, Liu L et al (2017) Binary Coding for Partial Action Analysis with Limited Observation Ratios. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Scholkopf B, Smola AJ (2002) Learning with kernels 2002
Shan C (2010) Face recognition and retrieval in video. Video search and mining, vol 2010. Springer, Berlin, pp 235–260
Song J, Gao L et al (2017) Quantization-based hashing: a general framework for scalable image and video retrieval. Pattern Recognition
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166
Wang D, Gao X, Wang X et al (2016) Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval. IEEE Transactions on Image Processing 25(10):4540–4554
Wang J, Liu W, Kumar S et al (2016) Learning to hash for indexing big dataa survey. Proceedings of the IEEE 2016 104(1):34–57
Wang R, Guo H, Davis LS et al (2012) Covariance discriminative learning: A natural and efficient approach to image set classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012:2496–2503
Wu Z, Ke Q, Sun J et al (2011) Scalable face image retrieval with identity-based quantization and multireference reranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(10):1991–2001
Xu R, Yang Y et al (2016) Binary Subspace Coding for Query-by-Image Video Retrieval. arXiv:1612.01657
Xu Z, Hu C, Mei L (2016) Video structured description technology based intelligence analysis of surveillance videos for public security applications. Multimed Tool and Appl 75(19):12155–12172
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. AAAI 1(2):7
Zhang L, Wang M et al (2015) An automatic three-dimensional scene reconstruction system using crowdsourced Geo-tagged videos. IEEE Trans Ind Electron 62(9):5738–5746
Zhang L, Su P, Zhang Y, Jing C, Shaoz L (2017) SnapVideo: personalized video generation for a sightseeing trip. IEEE Trans Cybern 47(11):3866–3878
Zhang N, Jeong HY (2016) A retrieval algorithm for specific face images in airport surveillance multimedia videos on cloud computing platform. Multimed Tool Appl 2016:1–15
Zheng F, Tang Y, Shao L (2016) Hetero-manifold Regularisation for Cross-modal Hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence 2016
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. ACM SIGIR conference on Research and Development in Information Retrieval 2014, 415–424
Acknowledgements
This paper is supported by National Natural Science Foundation of China under Grant #61472216.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dai, P., Wang, X., Zhang, W. et al. Implicit relative attribute enabled cross-modality hashing for face image-video retrieval. Multimed Tools Appl 77, 23547–23577 (2018). https://doi.org/10.1007/s11042-018-5684-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5684-3