Skip to main content

Class Concentration with Twin Variational Autoencoders for Unsupervised Cross-Modal Hashing

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13846))

Included in the following conference series:

  • 312 Accesses

Abstract

Multi-modal deep hash learning is arguably one of the most commonly used unsupervised methods in cross-modal retrieval tasks. Most existing deep hashing methods focus on maintaining similarity information in the hash code learning step. Although accurate and compact binary representations are learned, these methods fail to encourage discriminative learning of features. In this paper, we propose a new method called Class Concentrated Variational auto-encoder (CCTV) to learn discriminative hash codes. The novelty of CCTV lies in two aspects. First, the proposed method focuses on the concentration of the mean vector of latent features. Based on the assumption that the features in the shared latent space produce multivariate Gaussian, CCTV updates the mean vectors and the cluster centroids of the latent features at the same time by minimizing the class concentration loss, so as to narrow the distance between the cluster centroids and the mean vectors, and further make the concentration more compact. Secondly, under the constraint of raw similarity information, CCTV is different from previous works, it uses the mean vector of latent features as the representation of the images to reduce the influence of variance, and then embeds them in the Hamming space. Our experimental evaluation on four multimedia benchmarks shows a significant improvement over the state-of-the-art methods. Code is available at: https://github.com/theusernamealreadyexists/CCTV.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bronstein, M.M., Bronstein, A.M., Michel, F., Paragios, N.: Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: CVPR, pp. 3594–3601 (2010)

    Google Scholar 

  2. Cao, Y., Long, M., Wang, J., Yu, P.S.: Correlation hashing network for efficient cross-modal retrieval. In: BMVC (2017)

    Google Scholar 

  3. Cao, Z., Long, M., Wang, J., Yu, P.S.: Hashnet: deep learning to hash by continuation. In: ICCV, pp. 5608–5617 (2017)

    Google Scholar 

  4. Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)

  5. Chen, H., Ding, G., Liu, X., Lin, Z., Liu, J., Han, J.: IMRAM: iterative matching with recurrent attention memory for cross-modal image-text retrieval. In: CVPR, pp. 12655–12663 (2020)

    Google Scholar 

  6. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from national university of Singapore. In: ICIVR, pp. 1–9 (2009)

    Google Scholar 

  7. Deng, C., Chen, Z., Liu, X., Gao, X., Tao, D.: Triplet-based deep hashing network for cross-modal retrieval. IEEE TIP 27(8), 3893–3903 (2018)

    MathSciNet  MATH  Google Scholar 

  8. Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: CVPR, pp. 2075–2082 (2014)

    Google Scholar 

  9. Donoser, M., Bischof, H.: Diffusion processes for retrieval revisited. In: CVPR, pp. 1320–1327 (2013)

    Google Scholar 

  10. Gu, Y., Wang, S., Zhang, H., Yao, Y., Yang, W., Liu, L.: Clustering-driven unsupervised deep hashing for image retrieval. Neurocomputing 368, 114–123 (2019)

    Article  Google Scholar 

  11. Hu, D., Nie, F., Li, X.: Deep binary reconstruction for cross-modal hashing. IEEE TMM 21(4), 973–985 (2018)

    Google Scholar 

  12. Hu, H., Xie, L., Hong, R., Tian, Q.: Creating something from nothing: unsupervised knowledge distillation for cross-modal hashing. In: CVPR, June 2020

    Google Scholar 

  13. Hu, M., Yang, Y., Shen, F., Xie, N., Hong, R., Shen, H.T.: Collective reconstructive embeddings for cross-modal hashing. IEEE TIP 28(6), 2770–2784 (2018)

    MathSciNet  MATH  Google Scholar 

  14. Huiskes, M.J., Lew, M.S.: The MIR Flickr retrieval evaluation. In: ACM MM, pp. 39–43 (2008)

    Google Scholar 

  15. Irie, G., Arai, H., Taniguchi, Y.: Alternating co-quantization for cross-modal hashing. In: CVPR, pp. 1886–1894 (2015)

    Google Scholar 

  16. Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. arXiv preprint arXiv:2102.05918 (2021)

  17. Jiang, Q.Y., Li, W.J.: Deep cross-modal hashing. In: CVPR, pp. 3232–3240 (2017)

    Google Scholar 

  18. Jiang, Q.Y., Li, W.J.: Discrete latent factor model for cross-modal hashing. IEEE TIP 28(7), 3490–3501 (2019)

    MathSciNet  MATH  Google Scholar 

  19. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)

  20. Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. In: IJCAI (2011)

    Google Scholar 

  21. Li, C., Deng, C., Li, N., Liu, W., Gao, X., Tao, D.: Self-supervised adversarial hashing networks for cross-modal retrieval. In: CVPR, June 2018

    Google Scholar 

  22. Li, C., Deng, C., Wang, L., Xie, D., Liu, X.: Coupled cyclegan: unsupervised hashing network for cross-modal retrieval. In: AAAI, pp. 176–183 (2019)

    Google Scholar 

  23. Li, C., Chen, Z., Zhang, P., Luo, X., Nie, L., Xu, X.: Supervised robust discrete multimodal hashing for cross-media retrieval. IEEE TMM 21(11), 2863–2877 (2019)

    Google Scholar 

  24. Li, X., Shen, C., Dick, A., Van Den Hengel, A.: Learning compact binary codes for visual tracking. In: CVPR, pp. 2419–2426 (2013)

    Google Scholar 

  25. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  26. Lin, Z., Ding, G., Hu, M., Wang, J.: Semantics-preserving hashing for cross-view retrieval. In: CVPR, pp. 3864–3872 (2015)

    Google Scholar 

  27. Liu, S., Qian, S., Guan, Y., Zhan, J., Ying, L.: Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: ACM SIGIR, pp. 1379–1388 (2020)

    Google Scholar 

  28. Luo, X., Yin, X.Y., Nie, L., Song, X., Wang, Y., Xu, X.S.: SDMCH: supervised discrete manifold-embedded cross-modal hashing. In: IJCAI, pp. 2518–2524 (2018)

    Google Scholar 

  29. Mandal, D., Chaudhury, K.N., Biswas, S.: Generalized semantic preserving hashing for n-label cross-modal retrieval. In: CVPR, pp. 4076–4084 (2017)

    Google Scholar 

  30. Peng, Y., Qi, J.: CM-GANs: cross-modal generative adversarial networks for common representation learning. ACM TOMM 15(1), 1–24 (2019)

    Article  MathSciNet  Google Scholar 

  31. Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: ACM MM, pp. 251–260 (2010)

    Google Scholar 

  32. Rastegari, M., Choi, J., Fakhraei, S., Hal, D., Davis, L.: Predictable dual-view hashing. In: ICML, pp. 1328–1336 (2013)

    Google Scholar 

  33. Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., Akata, Z.: Generalized zero-and few-shot learning via aligned variational autoencoders. In: CVPR, pp. 8247–8255 (2019)

    Google Scholar 

  34. Shen, H.T., et al.: Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE TKDE (2020)

    Google Scholar 

  35. Shen, X., Zhang, H., Li, L., Zhang, Z., Chen, D., Liu, L.: Clustering-driven deep adversarial hashing for scalable unsupervised cross-modal retrieval. Neurocomputing 459, 152–164 (2021)

    Article  Google Scholar 

  36. Shi, Y., You, X., Zheng, F., Wang, S., Peng, Q.: Equally-guided discriminative hashing for cross-modal retrieval. In: IJCAI, pp. 4767–4773 (2019)

    Google Scholar 

  37. Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: ACM SIGKDD, pp. 785–796 (2013)

    Google Scholar 

  38. Su, S., Zhong, Z., Zhang, C.: Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: ICCV, pp. 3027–3035 (2019)

    Google Scholar 

  39. Sun, C., Song, X., Feng, F., Zhao, W.X., Zhang, H., Nie, L.: Supervised hierarchical cross-modal hashing. In: ACM SIGIR, pp. 725–734 (2019)

    Google Scholar 

  40. Wang, D., Cui, P., Ou, M., Zhu, W.: Learning compact hash codes for multimodal representations using orthogonal deep structure. IEEE TMM 17(9), 1404–1416 (2015)

    Google Scholar 

  41. Weiss, Y., Torralba, A., Fergus, R., et al.: Spectral hashing. In: NeurIPS, vol. 1, p. 4. Citeseer (2008)

    Google Scholar 

  42. Wu, B., Yang, Q., Zheng, W.S., Wang, Y., Wang, J.: Quantized correlation hashing for fast cross-modal search. In: IJCAI, pp. 3946–3952. Citeseer (2015)

    Google Scholar 

  43. Wu, G., et al.: Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp. 2854–2860 (2018)

    Google Scholar 

  44. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: ICML, pp. 478–487 (2016)

    Google Scholar 

  45. Yang, D., Wu, D., Zhang, W., Zhang, H., Li, B., Wang, W.: Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: ICMR, pp. 44–52 (2020)

    Google Scholar 

  46. Yang, E., Deng, C., Li, C., Liu, W., Li, J., Tao, D.: Shared predictive cross-modal deep quantization. IEEE TNNLS 29(11), 5292–5303 (2018)

    Google Scholar 

  47. Yang, E., Deng, C., Liu, W., Liu, X., Tao, D., Gao, X.: Pairwise relationship guided deep hashing for cross-modal retrieval. In: AAAI (2017)

    Google Scholar 

  48. Yu, J., Zhou, H., Zhan, Y., Tao, D.: Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing (2021)

    Google Scholar 

  49. Zhai, D., Chang, H., Zhen, Y., Liu, X., Chen, X., Gao, W.: Parametric local multimodal hashing for cross-view similarity search. In: IJCAI (2013)

    Google Scholar 

  50. Zhang, H., et al.: Deep unsupervised self-evolutionary hashing for image retrieval. IEEE Trans. Multimedia 23, 3400–3413 (2021)

    Article  Google Scholar 

  51. Zhang, J., Peng, Y., Yuan, M.: Unsupervised generative adversarial cross-modal hashing. In: AAAI (2018)

    Google Scholar 

  52. Zheng, Z., Zheng, L., Garrett, M., Yang, Y., Xu, M., Shen, Y.D.: Dual-path convolutional image-text embeddings with instance loss. ACM TOMM 16(2), 1–23 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants No. 61872187, No. 62077023 and No. 62072246, in part by the Natural Science Foundation of Jiangsu Province under Grant No. BK20201306, and in part by the “111” Program under Grant No. B13022.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haofeng Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, Y., Zhu, Y., Liao, S., Ye, Q., Zhang, H. (2023). Class Concentration with Twin Variational Autoencoders for Unsupervised Cross-Modal Hashing. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13846. Springer, Cham. https://doi.org/10.1007/978-3-031-26351-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26351-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26350-7

  • Online ISBN: 978-3-031-26351-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics