Skip to main content
Log in

Implicit relative attribute enabled cross-modality hashing for face image-video retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Face image-video retrieval refers to retrieving videos of a specific person with image query or searching face images of one person by using a video clip query. It has attracted much attention for broad applications like suspect tracking and identifying. This paper proposes a novel implicit relative attribute enabled cross-modality hashing (IRAH) method for large-scale face image-video retrieval. To cope with large-scale data, the proposed IRAH method facilitates fast cross-modality retrieval through embedding two entirely heterogeneous spaces, i.e., face images in Euclidean space and face videos on a Riemannian manifold, into a unified compact Hamming space. In order to resolve the semantic gap, IRAH maps the original low-level kernelized features to discriminative high-level implicit relative attributes. Therefore, the retrieval accuracy can be improved by leveraging both the label information across different modalities and the semantic structure obtained from the implicit relative attributes in each modality. To evaluate the proposed method, we conduct extensive experiments on two publicly available databases, i.e., the Big Bang Theory (BBT) and Buffy the Vampire Slayer (BVS). The experimental results demonstrate the superiority of the proposed method over different state-of-the-art cross-modality hashing methods. The performance gains are especially significant in the case that the hash code length is 8 bits, up to 12% improvements over the second best method among tested methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. An L, Zou C, Zhang L et al (2016) Scalable attribute-driven face image retrieval. Neurocomputing 2016(172):215–224

    Article  Google Scholar 

  2. Araujo A, Girod B (2017) Large-Scale Video Retrieval Using Image Queries. IEEE Transactions on Circuits and Systems for Video Technology 2017

  3. Araujo A, Chaves J, Angst R et al (2015) Temporal aggregation for large-scale query-by-image video retrieval. IEEE International Conference on Image Processing (ICIP) 2015:1519–1522

    Google Scholar 

  4. Bauml M, Tapaswi M, Stiefelhagen R (2013) Semi-supervised learning with constraints for person identification in multimedia data. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2013:3602–3609

    Google Scholar 

  5. Cevikalp H, Triggs B (2010) Face recognition based on image sets. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2010:2567–2573

    Google Scholar 

  6. Chakraborty S, Singh S, Chakraborty P (2016) Local Gradient Hexa Pattern: A Descriptor for Face Recognition and Retrieval. IEEE Transactions on Circuits and Systems for Video Technology 2016

  7. Chen BC, Chen YY, Kuo YH, Ngo TD, Le DD, Satoh S, Hsu WH (2016) Scalable face track retrieval in video archives using bag-of-faces sparse representation. IEEE Transactions on Circuits and Systems for Video Technology 2016

  8. Chen BC, Chen YY, Kuo YH et al (2013) Scalable face image retrieval using attribute-enhanced sparse codewords. IEEE Trans Multimedia 15(5):1163–1173

    Article  Google Scholar 

  9. Chen Z, Lu J, Feng J et al (2017) Nonlinear discrete hashing. IEEE Trans Multimedia 19(1):123–135

    Article  Google Scholar 

  10. Cui J, Liu Y et al (2013) Tracking generic human motion via fusion of low-and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst Hum 43(4):996–1002

    Article  Google Scholar 

  11. Dai P, Wang X, Zhang W (2017) Coarse-to-fine multiview 3d face reconstruction using multiple geometrical features. Multimed Tool Appl 2017:1–28

    Google Scholar 

  12. Ding G, Guo Y, Zhou J et al (2016) Large-scale cross-modality Search via Collective Matrix Factorization Hashing. IEEE Trans Image Process 25(11):5427–5440

    Article  MathSciNet  Google Scholar 

  13. Ding K, Fan B, Huo C, Xiang S, Pan C (2017) Cross-modal hashing via rank-order preserving. IEEE Trans Multimedia 19(3):571–585

    Article  Google Scholar 

  14. Ding S, Li G, Li Y et al (2016) Survsurf: human retrieval on large surveillance video data. Multimed Tool Appl 2016:1–29

    Google Scholar 

  15. Dong Z, Jia S et al (2016) Face Video Retrieval via Deep Learning of Binary Hash Representations. AAAI

  16. Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. VLDB 99(6):518–529

    Google Scholar 

  17. Hu Y, Mian AS, Owens R (2011) Sparse approximated nearest points for image set classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2011:121–128

    Google Scholar 

  18. Jiang Q, Li W (2017) Deep cross-modal hashing. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017:3270–3278

    Google Scholar 

  19. Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining 2002, pp 133–142

  20. Kafai M, Eshghi K, Bhanu B (2014) Discrete cosine transform locality-sensitive hashes for face retrieval. IEEE Trans Multimedia 16(4):1090–1103

    Article  Google Scholar 

  21. Kafai M, Eshghi K, Bhanu B (2014) Discrete cosine transform locality-sensitive hashes for face retrieval. IEEE Trans Multimedia 16(4):1090–1103

    Article  Google Scholar 

  22. Korman S, Avidan S (2016) Coherency sensitive hashing. IEEE Trans Pattern Anal Mach Intell 38(6):1099–1112

    Article  Google Scholar 

  23. Li Q, Zhou X et al (2016) Nuclear norm regularized convolutional Max Pos@ top machine. Neural Comput Applic 1–10, https://doi.org/10.1007/s00521-016-2680-2

  24. Li Y, Wang R, Cui Z et al (2016) Spatial pyramid covariance-based compact video code for robust face retrieval in TV-series. IEEE Trans Image Process 25(12):5905–5919

    Article  MathSciNet  Google Scholar 

  25. Li Y, Wang R, Huang Z et al (2015) Face video retrieval with image query via hashing across euclidean space and riemannian manifold. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015:4758–4767

    Google Scholar 

  26. Li Y, Zhang Y et al (2017) Large-Scale Remote Sensing Image Retrieval by Deep Hashing Networks. IEEE Trans Geosci Remote Sens 56(2):950–965

  27. Liang R, Shi L et al (2016) Optimizing top precision performance measure of content-based image retrieval by learning similarity function. IEEE International Conference on Pattern Recognition (ICPR)

  28. Lin K, Wang X, Tan Y (2016) Self-adaptive morphable model based collaborative multi-view 3d face reconstruction in visual sensor network. Multimed Tool Appl 75(18):11469–11491

    Article  Google Scholar 

  29. Lin J, Li Z et al (2017) Discriminative Deep Hashing for Scalable Face Image Retrieval. Proceedings of International Joint Conference on Artificial Intelligence

  30. Liu H, Zhao Q, Wang H et al (2016) An image-based near-duplicate video retrieval and localization using improved edit distance. Multimed Tool Appl 2016:1–22

    Google Scholar 

  31. Liu H, Wang R et al (2016) Deep supervised hashing for fast image retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  32. Liu L, Lin Z, Shao L et al (2017) Sequential discrete hashing for scalable cross-modality similarity retrieval. IEEE Trans Image Process 26(1):107–118

    Article  MathSciNet  Google Scholar 

  33. Liu L, Shen F et al (2017) Deep sketch hashing: Fast free-hand sketch-based image retrieval. arXiv:1703.05605

  34. Liu M, Zhang L et al (2017) Recognizing semantic correlation in image-text weibo via feature space mapping. Computer Vision and Image Understanding

  35. Liu Y, Zhang L et al (2016) Fortune Teller: Predicting Your Career Path. AAAI

  36. Liu Y, Cui J et al (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: 2012 21st International Conference on Pattern Recognition (ICPR). IEEE

  37. Liu Y, Liang Y et al (2016). arXiv:1610.09462

  38. Liu Y, Zheng Y et al (2016) Urban water quality prediction based on multi-task multi-view learning

  39. Liu X, Mu Y, Zhang D et al (2015) Large-scale unsupervised hashing with shared structure learning. IEEE Trans Cybern 45(9):1811–1822

    Article  Google Scholar 

  40. Liu X, Deng C, Lang B et al (2016) Query-adaptive reciprocal hash tables for nearest neighbor search. IEEE Trans Image Process 25(2):907–919

    Article  MathSciNet  Google Scholar 

  41. Lin Z, Ding G, Han J et al (2016) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybern 47(12):4342–4355

  42. Lu J, Wang G, Moulin P (2013) Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning. IEEE International Conference on Computer Vision (ICCV) 2013:329–336

    Google Scholar 

  43. Nie L, Hong R et al (2016) Perceptual attributes optimization for multivideo summarization. IEEE Trans Cybern 46(12):2991–3003

    Article  Google Scholar 

  44. Kumar N, Berg A, Belhumeur PN, Nayar S (2011) Describable visual attributes for face verification and image search. IEEE Trans Pattern Anal Mach Intell 33(10):1962–1977

    Article  Google Scholar 

  45. Parikh D, Grauman K (2011) Relative attributes. IEEE International Conference on Computer Vision (ICCV) 2011:503–510

    Google Scholar 

  46. Park U, Jain AK (2010) Face matching and retrieval using soft biometrics. IEEE Trans Inf Forensics Secur 5(3):406–415

    Article  Google Scholar 

  47. Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):2007

    Article  Google Scholar 

  48. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  49. Qiao S, Wang R et al (2016) Deep video code for efficient face video retrieval. Asian conference on computer vision. Springer, Berlin

    Google Scholar 

  50. Qin J, Liu L et al (2017) Binary Coding for Partial Action Analysis with Limited Observation Ratios. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  51. Scholkopf B, Smola AJ (2002) Learning with kernels 2002

  52. Shan C (2010) Face recognition and retrieval in video. Video search and mining, vol 2010. Springer, Berlin, pp 235–260

    Google Scholar 

  53. Song J, Gao L et al (2017) Quantization-based hashing: a general framework for scalable image and video retrieval. Pattern Recognition

  54. Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166

    Article  MathSciNet  Google Scholar 

  55. Wang D, Gao X, Wang X et al (2016) Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval. IEEE Transactions on Image Processing 25(10):4540–4554

    Article  MathSciNet  Google Scholar 

  56. Wang J, Liu W, Kumar S et al (2016) Learning to hash for indexing big dataa survey. Proceedings of the IEEE 2016 104(1):34–57

    Google Scholar 

  57. Wang R, Guo H, Davis LS et al (2012) Covariance discriminative learning: A natural and efficient approach to image set classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012:2496–2503

    Google Scholar 

  58. Wu Z, Ke Q, Sun J et al (2011) Scalable face image retrieval with identity-based quantization and multireference reranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(10):1991–2001

    Article  Google Scholar 

  59. Xu R, Yang Y et al (2016) Binary Subspace Coding for Query-by-Image Video Retrieval. arXiv:1612.01657

  60. Xu Z, Hu C, Mei L (2016) Video structured description technology based intelligence analysis of surveillance videos for public security applications. Multimed Tool and Appl 75(19):12155–12172

    Article  Google Scholar 

  61. Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. AAAI 1(2):7

    MathSciNet  Google Scholar 

  62. Zhang L, Wang M et al (2015) An automatic three-dimensional scene reconstruction system using crowdsourced Geo-tagged videos. IEEE Trans Ind Electron 62(9):5738–5746

    Article  Google Scholar 

  63. Zhang L, Su P, Zhang Y, Jing C, Shaoz L (2017) SnapVideo: personalized video generation for a sightseeing trip. IEEE Trans Cybern 47(11):3866–3878

    Article  Google Scholar 

  64. Zhang N, Jeong HY (2016) A retrieval algorithm for specific face images in airport surveillance multimedia videos on cloud computing platform. Multimed Tool Appl 2016:1–15

    Google Scholar 

  65. Zheng F, Tang Y, Shao L (2016) Hetero-manifold Regularisation for Cross-modal Hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence 2016

  66. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. ACM SIGIR conference on Research and Development in Information Retrieval 2014, 415–424

Download references

Acknowledgements

This paper is supported by National Natural Science Foundation of China under Grant #61472216.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xue Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, P., Wang, X., Zhang, W. et al. Implicit relative attribute enabled cross-modality hashing for face image-video retrieval. Multimed Tools Appl 77, 23547–23577 (2018). https://doi.org/10.1007/s11042-018-5684-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5684-3

Keywords

Navigation