Implicit relative attribute enabled cross-modality hashing for face image-video retrieval

Dai, Peng; Wang, Xue; Zhang, Weihang; Zhang, Pengbo; You, Wei

doi:10.1007/s11042-018-5684-3

Implicit relative attribute enabled cross-modality hashing for face image-video retrieval

Published: 31 January 2018

Volume 77, pages 23547–23577, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Peng Dai¹,
Xue Wang¹,
Weihang Zhang¹,
Pengbo Zhang¹ &
…
Wei You¹

389 Accesses
1 Citation
Explore all metrics

Abstract

Face image-video retrieval refers to retrieving videos of a specific person with image query or searching face images of one person by using a video clip query. It has attracted much attention for broad applications like suspect tracking and identifying. This paper proposes a novel implicit relative attribute enabled cross-modality hashing (IRAH) method for large-scale face image-video retrieval. To cope with large-scale data, the proposed IRAH method facilitates fast cross-modality retrieval through embedding two entirely heterogeneous spaces, i.e., face images in Euclidean space and face videos on a Riemannian manifold, into a unified compact Hamming space. In order to resolve the semantic gap, IRAH maps the original low-level kernelized features to discriminative high-level implicit relative attributes. Therefore, the retrieval accuracy can be improved by leveraging both the label information across different modalities and the semantic structure obtained from the implicit relative attributes in each modality. To evaluate the proposed method, we conduct extensive experiments on two publicly available databases, i.e., the Big Bang Theory (BBT) and Buffy the Vampire Slayer (BVS). The experimental results demonstrate the superiority of the proposed method over different state-of-the-art cross-modality hashing methods. The performance gains are especially significant in the case that the hash code length is 8 bits, up to 12% improvements over the second best method among tested methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Face Attributes Retrieval by Multi-Label Contractive Hashing

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

Discrete Hashing Based Supervised Matrix Factorization for Cross-Modal Retrieval

References

An L, Zou C, Zhang L et al (2016) Scalable attribute-driven face image retrieval. Neurocomputing 2016(172):215–224
Article Google Scholar
Araujo A, Girod B (2017) Large-Scale Video Retrieval Using Image Queries. IEEE Transactions on Circuits and Systems for Video Technology 2017
Araujo A, Chaves J, Angst R et al (2015) Temporal aggregation for large-scale query-by-image video retrieval. IEEE International Conference on Image Processing (ICIP) 2015:1519–1522
Google Scholar
Bauml M, Tapaswi M, Stiefelhagen R (2013) Semi-supervised learning with constraints for person identification in multimedia data. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2013:3602–3609
Google Scholar
Cevikalp H, Triggs B (2010) Face recognition based on image sets. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2010:2567–2573
Google Scholar
Chakraborty S, Singh S, Chakraborty P (2016) Local Gradient Hexa Pattern: A Descriptor for Face Recognition and Retrieval. IEEE Transactions on Circuits and Systems for Video Technology 2016
Chen BC, Chen YY, Kuo YH, Ngo TD, Le DD, Satoh S, Hsu WH (2016) Scalable face track retrieval in video archives using bag-of-faces sparse representation. IEEE Transactions on Circuits and Systems for Video Technology 2016
Chen BC, Chen YY, Kuo YH et al (2013) Scalable face image retrieval using attribute-enhanced sparse codewords. IEEE Trans Multimedia 15(5):1163–1173
Article Google Scholar
Chen Z, Lu J, Feng J et al (2017) Nonlinear discrete hashing. IEEE Trans Multimedia 19(1):123–135
Article Google Scholar
Cui J, Liu Y et al (2013) Tracking generic human motion via fusion of low-and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst Hum 43(4):996–1002
Article Google Scholar
Dai P, Wang X, Zhang W (2017) Coarse-to-fine multiview 3d face reconstruction using multiple geometrical features. Multimed Tool Appl 2017:1–28
Google Scholar
Ding G, Guo Y, Zhou J et al (2016) Large-scale cross-modality Search via Collective Matrix Factorization Hashing. IEEE Trans Image Process 25(11):5427–5440
Article MathSciNet Google Scholar
Ding K, Fan B, Huo C, Xiang S, Pan C (2017) Cross-modal hashing via rank-order preserving. IEEE Trans Multimedia 19(3):571–585
Article Google Scholar
Ding S, Li G, Li Y et al (2016) Survsurf: human retrieval on large surveillance video data. Multimed Tool Appl 2016:1–29
Google Scholar
Dong Z, Jia S et al (2016) Face Video Retrieval via Deep Learning of Binary Hash Representations. AAAI
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. VLDB 99(6):518–529
Google Scholar
Hu Y, Mian AS, Owens R (2011) Sparse approximated nearest points for image set classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2011:121–128
Google Scholar
Jiang Q, Li W (2017) Deep cross-modal hashing. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017:3270–3278
Google Scholar
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining 2002, pp 133–142
Kafai M, Eshghi K, Bhanu B (2014) Discrete cosine transform locality-sensitive hashes for face retrieval. IEEE Trans Multimedia 16(4):1090–1103
Article Google Scholar
Kafai M, Eshghi K, Bhanu B (2014) Discrete cosine transform locality-sensitive hashes for face retrieval. IEEE Trans Multimedia 16(4):1090–1103
Article Google Scholar
Korman S, Avidan S (2016) Coherency sensitive hashing. IEEE Trans Pattern Anal Mach Intell 38(6):1099–1112
Article Google Scholar
Li Q, Zhou X et al (2016) Nuclear norm regularized convolutional Max Pos@ top machine. Neural Comput Applic 1–10, https://doi.org/10.1007/s00521-016-2680-2
Li Y, Wang R, Cui Z et al (2016) Spatial pyramid covariance-based compact video code for robust face retrieval in TV-series. IEEE Trans Image Process 25(12):5905–5919
Article MathSciNet Google Scholar
Li Y, Wang R, Huang Z et al (2015) Face video retrieval with image query via hashing across euclidean space and riemannian manifold. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015:4758–4767
Google Scholar
Li Y, Zhang Y et al (2017) Large-Scale Remote Sensing Image Retrieval by Deep Hashing Networks. IEEE Trans Geosci Remote Sens 56(2):950–965
Liang R, Shi L et al (2016) Optimizing top precision performance measure of content-based image retrieval by learning similarity function. IEEE International Conference on Pattern Recognition (ICPR)
Lin K, Wang X, Tan Y (2016) Self-adaptive morphable model based collaborative multi-view 3d face reconstruction in visual sensor network. Multimed Tool Appl 75(18):11469–11491
Article Google Scholar
Lin J, Li Z et al (2017) Discriminative Deep Hashing for Scalable Face Image Retrieval. Proceedings of International Joint Conference on Artificial Intelligence
Liu H, Zhao Q, Wang H et al (2016) An image-based near-duplicate video retrieval and localization using improved edit distance. Multimed Tool Appl 2016:1–22
Google Scholar
Liu H, Wang R et al (2016) Deep supervised hashing for fast image retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Liu L, Lin Z, Shao L et al (2017) Sequential discrete hashing for scalable cross-modality similarity retrieval. IEEE Trans Image Process 26(1):107–118
Article MathSciNet Google Scholar
Liu L, Shen F et al (2017) Deep sketch hashing: Fast free-hand sketch-based image retrieval. arXiv:1703.05605
Liu M, Zhang L et al (2017) Recognizing semantic correlation in image-text weibo via feature space mapping. Computer Vision and Image Understanding
Liu Y, Zhang L et al (2016) Fortune Teller: Predicting Your Career Path. AAAI
Liu Y, Cui J et al (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: 2012 21st International Conference on Pattern Recognition (ICPR). IEEE
Liu Y, Liang Y et al (2016). arXiv:1610.09462
Liu Y, Zheng Y et al (2016) Urban water quality prediction based on multi-task multi-view learning
Liu X, Mu Y, Zhang D et al (2015) Large-scale unsupervised hashing with shared structure learning. IEEE Trans Cybern 45(9):1811–1822
Article Google Scholar
Liu X, Deng C, Lang B et al (2016) Query-adaptive reciprocal hash tables for nearest neighbor search. IEEE Trans Image Process 25(2):907–919
Article MathSciNet Google Scholar
Lin Z, Ding G, Han J et al (2016) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybern 47(12):4342–4355
Lu J, Wang G, Moulin P (2013) Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. IEEE International Conference on Computer Vision (ICCV) 2013:329–336
Google Scholar
Nie L, Hong R et al (2016) Perceptual attributes optimization for multivideo summarization. IEEE Trans Cybern 46(12):2991–3003
Article Google Scholar
Kumar N, Berg A, Belhumeur PN, Nayar S (2011) Describable visual attributes for face verification and image search. IEEE Trans Pattern Anal Mach Intell 33(10):1962–1977
Article Google Scholar
Parikh D, Grauman K (2011) Relative attributes. IEEE International Conference on Computer Vision (ICCV) 2011:503–510
Google Scholar
Park U, Jain AK (2010) Face matching and retrieval using soft biometrics. IEEE Trans Inf Forensics Secur 5(3):406–415
Article Google Scholar
Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):2007
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Qiao S, Wang R et al (2016) Deep video code for efficient face video retrieval. Asian conference on computer vision. Springer, Berlin
Google Scholar
Qin J, Liu L et al (2017) Binary Coding for Partial Action Analysis with Limited Observation Ratios. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Scholkopf B, Smola AJ (2002) Learning with kernels 2002
Shan C (2010) Face recognition and retrieval in video. Video search and mining, vol 2010. Springer, Berlin, pp 235–260
Google Scholar
Song J, Gao L et al (2017) Quantization-based hashing: a general framework for scalable image and video retrieval. Pattern Recognition
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166
Article MathSciNet Google Scholar
Wang D, Gao X, Wang X et al (2016) Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval. IEEE Transactions on Image Processing 25(10):4540–4554
Article MathSciNet Google Scholar
Wang J, Liu W, Kumar S et al (2016) Learning to hash for indexing big dataa survey. Proceedings of the IEEE 2016 104(1):34–57
Google Scholar
Wang R, Guo H, Davis LS et al (2012) Covariance discriminative learning: A natural and efficient approach to image set classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012:2496–2503
Google Scholar
Wu Z, Ke Q, Sun J et al (2011) Scalable face image retrieval with identity-based quantization and multireference reranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(10):1991–2001
Article Google Scholar
Xu R, Yang Y et al (2016) Binary Subspace Coding for Query-by-Image Video Retrieval. arXiv:1612.01657
Xu Z, Hu C, Mei L (2016) Video structured description technology based intelligence analysis of surveillance videos for public security applications. Multimed Tool and Appl 75(19):12155–12172
Article Google Scholar
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. AAAI 1(2):7
MathSciNet Google Scholar
Zhang L, Wang M et al (2015) An automatic three-dimensional scene reconstruction system using crowdsourced Geo-tagged videos. IEEE Trans Ind Electron 62(9):5738–5746
Article Google Scholar
Zhang L, Su P, Zhang Y, Jing C, Shaoz L (2017) SnapVideo: personalized video generation for a sightseeing trip. IEEE Trans Cybern 47(11):3866–3878
Article Google Scholar
Zhang N, Jeong HY (2016) A retrieval algorithm for specific face images in airport surveillance multimedia videos on cloud computing platform. Multimed Tool Appl 2016:1–15
Google Scholar
Zheng F, Tang Y, Shao L (2016) Hetero-manifold Regularisation for Cross-modal Hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence 2016
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. ACM SIGIR conference on Research and Development in Information Retrieval 2014, 415–424

Download references

Acknowledgements

This paper is supported by National Natural Science Foundation of China under Grant #61472216.

Author information

Authors and Affiliations

State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijng, 100084, China
Peng Dai, Xue Wang, Weihang Zhang, Pengbo Zhang & Wei You

Authors

Peng Dai
View author publications
You can also search for this author in PubMed Google Scholar
Xue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weihang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Pengbo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei You
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xue Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dai, P., Wang, X., Zhang, W. et al. Implicit relative attribute enabled cross-modality hashing for face image-video retrieval. Multimed Tools Appl 77, 23547–23577 (2018). https://doi.org/10.1007/s11042-018-5684-3

Download citation

Received: 15 June 2017
Revised: 29 November 2017
Accepted: 18 January 2018
Published: 31 January 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11042-018-5684-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implicit relative attribute enabled cross-modality hashing for face image-video retrieval

Abstract

Access this article

Similar content being viewed by others

Face Attributes Retrieval by Multi-Label Contractive Hashing

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

Discrete Hashing Based Supervised Matrix Factorization for Cross-Modal Retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Implicit relative attribute enabled cross-modality hashing for face image-video retrieval

Abstract

Access this article

Similar content being viewed by others

Face Attributes Retrieval by Multi-Label Contractive Hashing

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

Discrete Hashing Based Supervised Matrix Factorization for Cross-Modal Retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation