Skip to main content
Log in

Learning Multifunctional Binary Codes for Personalized Image Retrieval

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Due to the highly complex semantic information of images, even with the same query image, the expected content-based image retrieval results could be very different and personalized in different scenarios. However, most existing hashing methods only preserve one single type of semantic similarity, making them incapable of addressing such realistic retrieval tasks. To deal with this problem, we propose a unified hashing framework to encode multiple types of information into the binary codes by exploiting convolutional networks (CNNs). Specifically, we assume that typical retrieval tasks are generally defined in two aspects, i.e. high-level semantics (e.g. object categories) and visual attributes (e.g. object shape and color). To this end, our Dual Purpose Hashing model is trained to jointly preserve two kinds of similarities characterizing the two aspects respectively. Moreover, since images with both category and attribute labels are scarce, our model is carefully designed to leverage the abundant partially labelled data as training inputs to alleviate the risk of overfitting. With such a framework, the binary codes of new-coming images can be readily obtained by quantizing the outputs of a specific CNN layer, and different retrieval tasks can be achieved by using the binary codes in different ways. Experiments on two large-scale datasets show that our method achieves comparable or even better performance than those state-of-the-art methods specifically designed for each individual retrieval task while being more compact than the compared methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. The source code of DPH and the ImageNet-150K dataset are available at “http://vipl.ict.ac.cn/resources/codes”.

  2. With the source codes released by the original authors, dozens of HashNet models under different hyperparameter settings are trained and the best results among these models are reported.

References

  • Al-Halah, Z., Lehrmann, A. M., & Sigal, L. (2018). Towards traversing the continuous spectrum of image retrieval. arXiv preprint arXiv:1812.00202.

  • Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network dissection: Quantifying interpretability of deep visual representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp 3319–3327).

  • Cakir, F., He, K., & Sclaroff, S. (2018). Hashing with binary matrix pursuit. In: Proceedings of the European conference on computer vision (ECCV) (pp. 332–348).

  • Cao, J., Li, Y., & Zhang, Z. (2018). Partially shared multi-task convolutional neural network with local constraint for face attribute learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4290–4299).

  • Cao, Y., Liu, B., Long, M., & Wang, J. (2018). Hashgan: Deep learning to hash with pair conditional wasserstein gan. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1287–1296).

  • Cao, Y., Long, M., Liu, B., & Wang, J. (2018). Deep cauchy hashing for hamming space retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1229–1237).

  • Cao, Z., Long, M., Wang, J., & Yu, P. S. (2017). Hashnet: Deep learning to hash by continuation. In: Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 5609–5618).

  • Deng, C., Chen, Z., Liu, X., Gao, X., & Tao, D. (2018). Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing (TIP), 27(8), 3893–3903.

    Article  MathSciNet  Google Scholar 

  • Escorcia, V., Niebles, J. C., & Ghanem, B. (2015). On the relationship between visual attributes and convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1256–1264).

  • Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. Very Large Data Base (VLDB), 99, 518–529.

    Google Scholar 

  • Gong, Y., & Lazebnik, S. (2011). Iterative quantization: A procrustean approach to learning binary codes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 817–824).

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 2961–2969).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778).

  • Hu, R., Xu, H., Rohrbach, M., Feng, J., Saenko, K., & Darrell, T. (2016). Natural language object retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4555–4564).

  • Huang, C., Loy, C. C., & Tang, X. (2016). Unsupervised learning of discriminative attributes and visual representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5175–5184).

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In: International conference on multimedia (MM) (pp. 675–678).

  • Jiang, Q. Y., Cui, X., & Li, W. J. (2018). Deep discrete supervised hashing. IEEE Transactions on Image Processing (TIP), 27(12), 5996–6009.

    Article  MathSciNet  Google Scholar 

  • Jiang, Q. Y., & Li, W. J. (2017). Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3232–3240).

  • Kokkinos, I. (2017). Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5454–5463).

  • Kovashka, A., & Grauman, K. (2013). Attribute adaptation for personalized image search. In: Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 3432–3439).

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (NIPS) (pp. 1097–1105).

  • Kulis, B., & Darrell, T. (2009). Learning to hash with binary reconstructive embeddings. In: Advances in neural information processing systems (NIPS) (pp. 1042–1050).

  • Kumar, N., Belhumeur, P., & Nayar, S. (2008). Facetracer: A search engine for large collections of images with faces. In: European conference on computer vision (ECCV) (pp. 340–353).

  • Lai, H., Pan, Y., Liu, Y., & Yan, S. (2015). Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3270–3278).

  • Li, Y., Wang, R., Liu, H., Jiang, H., Shan, S., & Chen, X. (2015). Two birds, one stone: Jointly learning binary code for large-scale face image retrieval and attributes prediction. In: Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 3819–3827).

  • Liu, H., Wang, R., Shan, S., & Chen, X. (2016). Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2064–2072).

  • Liu, H., Wang, R., Shan, S., & Chen, X. (2017). Learning multifunctional binary codes for both category and attribute oriented retrieval tasks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6259–6268).

  • Liu, L., Chen, J., Fieguth, P., Zhao, G., Chellappa, R., & Pietikäinen, M. (2019). From bow to cnn: Two decades of texture representation for texture classification. International Journal of Computer Vision, 127(1), 74–109.

    Article  Google Scholar 

  • Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., et al. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261–318.

    Article  Google Scholar 

  • Liu, W., Wang, J., Ji, R., Jiang, Y. G., & Chang, S. F. (2012). Supervised hashing with kernels. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2074–2081).

  • Liu, X., He, J., Deng, C., & Lang, B. (2014). Collaborative hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2139–2146).

  • Liu, X., He, J., Lang, B., & Chang, S. F. (2013). Hash bit selection: a unified solution for selection problems in hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1570–1577).

  • Liu, X., Huang, L., Deng, C., Lu, J., & Lang, B. (2015). Multi-view complementary hash tables for nearest neighbor search. In: Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 1107–1115).

  • Liu, Z., Luo, P., Qiu, S., Wang, X., & Tang, X. (2016). Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1096–1104).

  • Long, Y., Liu, L., Shen, Y., & Shao, L. (2018). Towards affordable semantic searching: Zero-shot retrieval via dominant attributes. In: Thirty-Second AAAI conference on artificial intelligence.

  • Norouzi, M., & Fleet, D. J. (2011). Minimal loss hashing for compact binary codes. In: International conference on machine learning (ICML) (pp. 353–360).

  • Parikh, D., & Grauman, K. (2011). Interactively building a discriminative vocabulary of nameable attributes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1681–1688).

  • Patterson, G., Xu, C., Su, H., & Hays, J. (2014). The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision (IJCV), 108(1–2), 59–81.

    Article  Google Scholar 

  • Rastegari, M., Diba, A., Parikh, D., & Farhadi, A. (2013). Multi-attribute queries: To merge or not to merge? In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3310–3317).

  • Rastegari, M., Farhadi, A., & Forsyth, D. (2012). Attribute discovery via predictable discriminative binary codes. In: European conference on computer vision (ECCV) (pp. 876–889).

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Sadovnik, A., Gallagher, A., Parikh, D., & Chen, T. (2013). Spoken attributes: Mixing binary and relative attributes to say the right thing. In: Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 2160–2167).

  • Scheirer, W. J., Kumar, N., Belhumeur, P. N., & Boult, T. E. (2012). Multi-attribute spaces: Calibration for attribute fusion and similarity search. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2933–2940).

  • Shen, F., Shen, C., Liu, W., & Tao Shen, H. (2015). Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 37–45).

  • Shen, L., Lin, Z., & Huang, Q. (2016). Relay backpropagation for effective learning of deep convolutional neural networks. In: European conference on computer vision (ECCV) (pp. 467–482).

  • Siddiquie, B., Feris, R. S., & Davis, L. S. (2011). Image ranking and retrieval based on multi-attribute queries. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 801–808).

  • Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In: Advances in neural information processing systems (NIPS) (pp. 1988–1996).

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–9).

  • Tao, R., Smeulders, A. W. M., & Chang, S. F. (2015). Attributes and categories for generic instance search from one example. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 177–186).

  • Turakhia, N., & Parikh, D. (2013). Attribute dominance: What pops out? In: Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 1225–1232).

  • Veit, A., Belongie, S., & Karaletsos, T. (2017). Conditional similarity networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 830–838).

  • Wang, J., Kumar, S., & Chang, S. F. (2012). Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 34(12), 2393–2406.

    Article  Google Scholar 

  • Wang, L., Lee, C. Y., Tu, Z., & Lazebnik, S. (2015). Training deeper convolutional networks with deep supervision. arXiv preprint arXiv:1505.02496.

  • Weiss, Y., Torralba, A., & Fergus, R. (2008). Spectral hashing. In: Advances in neural information processing systems (NIPS) (pp. 1753–1760).

  • Xia, R., Pan, Y., Lai, H., Liu, C., & Yan, S. (2014). Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence.

  • Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3485–3492).

  • Yang, E., Deng, C., Liu, W., Liu, X., Tao, D., & Gao, X. (2017). Pairwise relationship guided deep hashing for cross-modal retrieval. In: Thirty-first AAAI conference on artificial intelligence.

  • Yang, H. F., Lin, K., & Chen, C. S. (2015). Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 40(2), 27–35.

    Google Scholar 

  • Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Learning face representation from scratch. arXiv preprint arXiv:1411.7923.

  • Yu, F. X., Ji, R., Tsai, M. H., Ye, G., & Chang, S. F. (2012). Weak attributes for large-scale image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2949–2956).

  • Zamir, A. R., Sax, A., Shen, W., Guibas, L. J., Malik, J., & Savarese, S. (2018). Taskonomy: Disentangling task transfer learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3712–3722).

  • Zhang, R., Lin, L., Zhang, R., Zuo, W., & Zhang, L. (2015). Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Transactions on Image Processing (TIP), 24(12), 4766–4779.

    Article  MathSciNet  Google Scholar 

  • Zhang, X., Zhang, L., Wang, X. J., & Shum, H. Y. (2012). Finding celebrities in billions of web images. IEEE Transactions on Multimedia (TMM), 14(4), 995–1007.

    Article  Google Scholar 

  • Zhang, Z., Chen, Y., & Saligrama, V. (2016). Efficient training of very deep neural networks for supervised hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1487–1495).

  • Zhao, F., Huang, Y., Wang, L., & Tan, T. (2015). Deep semantic ranking based hashing for multi-label image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1556–1564).

  • Zhong, Y., Sullivan, J., & Li, H. (2016). Face attribute prediction with classification cnn. arXiv preprint arXiv:1602.01827.

  • Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In: Advances in neural information processing systems (NIPS) (pp. 487–495).

Download references

Acknowledgements

This work was done at the Institute of Computing Technology, Chinese Academy of Sciences, where Haomiao Liu pursued the PhD degree. This work is partially supported by 973 Program under contract No. 2015CB351802, Natural Science Foundation of China under contracts Nos. 61390511, 61772500, CAS Frontier Science Key Research Project No. QYZDJ-SSWJSC009, and Youth Innovation Promotion Association No. 2015085.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruiping Wang.

Additional information

Communicated by Li Liu, Matti Pietikäinen, Jie Qin, Jie Chen, Wanli Ouyang, Luc Van Gool.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Wang, R., Shan, S. et al. Learning Multifunctional Binary Codes for Personalized Image Retrieval. Int J Comput Vis 128, 2223–2242 (2020). https://doi.org/10.1007/s11263-020-01315-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01315-0

Keywords

Navigation