skip to main content
10.1145/3460426.3463626acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval

Authors Info & Claims
Published:01 September 2021Publication History

ABSTRACT

Cross-modal hashing (CMH) maps heterogeneous multiple modality data into compact binary code to achieve fast and flexible retrieval across different modalities, especially in large-scale retrieval. As the data don't need a lot of manual annotation, unsupervised cross-modal hashing has a wider application prospect than supervised method. However, the existing unsupervised methods are difficult to achieve satisfactory performance due to the lack of credible supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised Knowledge Distillation Cross-Modal Hashing method (KDCMH), which can use similarity information distilled from unsupervised method to guide supervised method. Specifically, firstly, the teacher model adopted an unsupervised distribution-based similarity hashing method, which can construct a modal fusion similarity matrix.Secondly, under the supervision of teacher model distillation information, student model can generate more discriminative hash codes. In two public datasets NUS-WIDE and MIRFLICKR-25K, extensive experiments have proved the significant improvement of KDCMH on several representative unsupervised cross-modal hashing methods.

References

  1. Michael M Bronstein, Alexander M Bronstein, Fabrice Michel, and Nikos Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3594--3601. IEEE, 2010.Google ScholarGoogle Scholar
  2. Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S Yu. Deep visual-semantic hashing for cross-modal retrieval. In SIGKDD, pages 1445--1454, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Guiguang Ding, Yuchen Guo, and Jile Zhou. Collective matrix factorization hashing for multimodal data. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pages 2075--2082.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing, 27(8):3893--3903, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  5. Fangxiang Feng, Xiaojie Wang, and Ruifan Li. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM international conference on Multimedia, pages 7--16. ACM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Qing-Yuan Jiang and Wu-Jun Li. Deep cross-modal hashing. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3232--3240, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  7. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep convolutional neural networks. In NeurIPS, pages 1097--1105, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 4242--4251,2018.Google ScholarGoogle ScholarCross RefCross Ref
  9. Shaishav Kumar and Raghavendra Udupa. Learning hash functions for cross-view similarity search. In Twenty-Second International Joint Conference on Artificial Intelligence, 2011.Google ScholarGoogle Scholar
  10. Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. Semantics-preserving hashing for cross-view retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3864--3872, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  11. Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang, and Baochang Zhang. Cross-modality binary code learning via fusion similarity hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7380--7388, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD, pages 785--796, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Botong Wu, Qiang Yang, Wei-Shi Zheng, Yizhou Wang, and Jingdong Wang. Quantized correlation hashing for fast cross-modal search. In IJCAI, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In ECCV, pages 818--833, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ting Zhang and Jingdong Wang. Collaborative quantization for cross-modal similarity search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2036--2045, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  16. Xi Zhang, Hanjiang Lai, and Jiashi Feng. Attention-aware deep adversarial hashing for cross-modal retrieval. In Proceedings of the European Conference on Computer Vision (ECCV), pages 591--606, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Song Liu,Shengsheng Qian,Yang Guan,Jiawei Zhan and Long Ying. Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval.In SIGIR, July 25--30, 2020, p1379--1388.Google ScholarGoogle Scholar
  18. B. Wang, Y. Yang, X. Xu, A. Hanjalic, and H. T. Shen. Adversarial cross-modal retrieval. In ACM MM, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jian Zhang, Yuxin Peng, and Mingkuan Yuan. Unsupervised generative adversarial cross-modal hashing. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  20. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google ScholarGoogle Scholar
  21. Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. NUS-WIDE: A real-world web image database from national University of Singapore. In ICIVR, pages 48--56, 2009.Google ScholarGoogle Scholar
  22. Mark J Huiskes and Michael SLew. The MIRFlickr retrieval evaluation. In ICMIR, pages 39--43, 2008.Google ScholarGoogle Scholar
  23. Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, and Larry Davis. Predictable dual-view hashing. In International Conference on Machine Learning, pages 1328--1336, 2013.Google ScholarGoogle Scholar
  24. Guiguang Ding, Yuchen Guo, Jile Zhou, and Yue Gao. Large-scale cross-modality search via collective matrix factorization hashing. IEEE Transactions on Image Processing, 25(11):5427--5440, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mingsheng Long, Yue Cao, Jianmin Wang, and Philip S Yu. Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 579--588. ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4133--4141, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mingyong Li and Hongya Wang.Deep Semantic Adversarial Hashing Based on Autoencoder for Large-Scale Cross-Modal Retrieval.2020 IEEE International Conference on Multimedia Expo Workshops (ICMEW), 2020, pp. 1--6.Google ScholarGoogle Scholar
  28. Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. 2015. Semantics preserving hashing for cross-view retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 3864--3872.Google ScholarGoogle ScholarCross RefCross Ref
  29. Zhaoda Ye and Yuxin Peng.Multi-Scale Correlation for Sequential Cross-modal Hashing Learning.In ACMMM 2018. 852--860.Google ScholarGoogle Scholar
  30. Dongqing Zhang and Wu-Jun Li. 2014. Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization. In AAAI. 2177--2183.Google ScholarGoogle Scholar
  31. Jile Zhou, Guiguang Ding, and Yuchen Guo. 2014. Latent semantic sparse hashing for cross-modal similarity search. In SIGIR. ACM, 415--424.Google ScholarGoogle Scholar
  32. Jun Yu, Hao Zhou, Yibing Zhan, Dacheng Tao. Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing. In AAAI 2021.Google ScholarGoogle Scholar
  33. Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, and Jialie Shen. 2018. Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval. In IJCAI. 2854--2860.Google ScholarGoogle Scholar
  34. Shupeng Su, Zhisheng Zhong, and Chao Zhang. 2019. Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval. In ICCV. 3027--3035.Google ScholarGoogle Scholar
  35. Dejie Yang, Dayan Wu, Wanqian Zhang, Haisu Zhang, Bo Li, Weiping Wang. Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval, ICMR 2020Google ScholarGoogle Scholar
  36. Chao Li, Cheng Deng, Lei Wang, De Xie, and Xianglong Liu. Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval. In AAAI 2018.Google ScholarGoogle Scholar
  37. Lin Wu, Yang Wang, and Ling Shao. Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Transactions on Image Processing, 28(4):1602--1612, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hengtong Hu, Lingxi Xie, Richang Hong, Qi Tian. Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing. In IEEE Conference on Computer Vision and Pattern Recognition, 2020Google ScholarGoogle ScholarCross RefCross Ref
  39. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.Google ScholarGoogle Scholar

Index Terms

  1. Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
        August 2021
        715 pages
        ISBN:9781450384636
        DOI:10.1145/3460426

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 September 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate254of830submissions,31%

        Upcoming Conference

        ICMR '24
        International Conference on Multimedia Retrieval
        June 10 - 14, 2024
        Phuket , Thailand

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader