research-article

Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval

Authors:
Mingyong Li

Donghua University & Chongqing Normal University, Shanghai, China

Donghua University & Chongqing Normal University, Shanghai, China
View Profile

,
Hongya Wang

Donghua University & Shanghai Key Laboratory of Computer Software Evaluating and Testing, Shanghai, China

Donghua University & Shanghai Key Laboratory of Computer Software Evaluating and Testing, Shanghai, China
View Profile

ICMR '21: Proceedings of the 2021 International Conference on Multimedia RetrievalAugust 2021Pages 183–191https://doi.org/10.1145/3460426.3463626

Published:01 September 2021Publication History

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Pages 183–191

ABSTRACT

Cross-modal hashing (CMH) maps heterogeneous multiple modality data into compact binary code to achieve fast and flexible retrieval across different modalities, especially in large-scale retrieval. As the data don't need a lot of manual annotation, unsupervised cross-modal hashing has a wider application prospect than supervised method. However, the existing unsupervised methods are difficult to achieve satisfactory performance due to the lack of credible supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised Knowledge Distillation Cross-Modal Hashing method (KDCMH), which can use similarity information distilled from unsupervised method to guide supervised method. Specifically, firstly, the teacher model adopted an unsupervised distribution-based similarity hashing method, which can construct a modal fusion similarity matrix.Secondly, under the supervision of teacher model distillation information, student model can generate more discriminative hash codes. In two public datasets NUS-WIDE and MIRFLICKR-25K, extensive experiments have proved the significant improvement of KDCMH on several representative unsupervised cross-modal hashing methods.

References

Michael M Bronstein, Alexander M Bronstein, Fabrice Michel, and Nikos Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3594--3601. IEEE, 2010.Google Scholar
Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S Yu. Deep visual-semantic hashing for cross-modal retrieval. In SIGKDD, pages 1445--1454, 2016.Google ScholarDigital Library
Guiguang Ding, Yuchen Guo, and Jile Zhou. Collective matrix factorization hashing for multimodal data. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pages 2075--2082.Google ScholarDigital Library
Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing, 27(8):3893--3903, 2018.Google ScholarCross Ref
Fangxiang Feng, Xiaojie Wang, and Ruifan Li. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM international conference on Multimedia, pages 7--16. ACM, 2014.Google ScholarDigital Library
Qing-Yuan Jiang and Wu-Jun Li. Deep cross-modal hashing. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3232--3240, 2017.Google ScholarCross Ref
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep convolutional neural networks. In NeurIPS, pages 1097--1105, 2012.Google ScholarDigital Library
Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 4242--4251,2018.Google ScholarCross Ref
Shaishav Kumar and Raghavendra Udupa. Learning hash functions for cross-view similarity search. In Twenty-Second International Joint Conference on Artificial Intelligence, 2011.Google Scholar
Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. Semantics-preserving hashing for cross-view retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3864--3872, 2015.Google ScholarCross Ref
Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang, and Baochang Zhang. Cross-modality binary code learning via fusion similarity hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7380--7388, 2017.Google ScholarCross Ref
Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD, pages 785--796, 2013.Google ScholarDigital Library
Botong Wu, Qiang Yang, Wei-Shi Zheng, Yizhou Wang, and Jingdong Wang. Quantized correlation hashing for fast cross-modal search. In IJCAI, 2015.Google ScholarDigital Library
Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In ECCV, pages 818--833, 2014.Google ScholarCross Ref
Ting Zhang and Jingdong Wang. Collaborative quantization for cross-modal similarity search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2036--2045, 2016.Google ScholarCross Ref
Xi Zhang, Hanjiang Lai, and Jiashi Feng. Attention-aware deep adversarial hashing for cross-modal retrieval. In Proceedings of the European Conference on Computer Vision (ECCV), pages 591--606, 2018.Google ScholarDigital Library
Song Liu,Shengsheng Qian,Yang Guan,Jiawei Zhan and Long Ying. Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval.In SIGIR, July 25--30, 2020, p1379--1388.Google Scholar
B. Wang, Y. Yang, X. Xu, A. Hanjalic, and H. T. Shen. Adversarial cross-modal retrieval. In ACM MM, 2017.Google ScholarDigital Library
Jian Zhang, Yuxin Peng, and Mingkuan Yuan. Unsupervised generative adversarial cross-modal hashing. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google Scholar
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. NUS-WIDE: A real-world web image database from national University of Singapore. In ICIVR, pages 48--56, 2009.Google Scholar
Mark J Huiskes and Michael SLew. The MIRFlickr retrieval evaluation. In ICMIR, pages 39--43, 2008.Google Scholar
Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, and Larry Davis. Predictable dual-view hashing. In International Conference on Machine Learning, pages 1328--1336, 2013.Google Scholar
Guiguang Ding, Yuchen Guo, Jile Zhou, and Yue Gao. Large-scale cross-modality search via collective matrix factorization hashing. IEEE Transactions on Image Processing, 25(11):5427--5440, 2016.Google ScholarDigital Library
Mingsheng Long, Yue Cao, Jianmin Wang, and Philip S Yu. Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 579--588. ACM, 2016.Google ScholarDigital Library
Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4133--4141, 2017.Google ScholarCross Ref
Mingyong Li and Hongya Wang.Deep Semantic Adversarial Hashing Based on Autoencoder for Large-Scale Cross-Modal Retrieval.2020 IEEE International Conference on Multimedia Expo Workshops (ICMEW), 2020, pp. 1--6.Google Scholar
Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. 2015. Semantics preserving hashing for cross-view retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 3864--3872.Google ScholarCross Ref
Zhaoda Ye and Yuxin Peng.Multi-Scale Correlation for Sequential Cross-modal Hashing Learning.In ACMMM 2018. 852--860.Google Scholar
Dongqing Zhang and Wu-Jun Li. 2014. Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization. In AAAI. 2177--2183.Google Scholar
Jile Zhou, Guiguang Ding, and Yuchen Guo. 2014. Latent semantic sparse hashing for cross-modal similarity search. In SIGIR. ACM, 415--424.Google Scholar
Jun Yu, Hao Zhou, Yibing Zhan, Dacheng Tao. Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing. In AAAI 2021.Google Scholar
Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, and Jialie Shen. 2018. Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval. In IJCAI. 2854--2860.Google Scholar
Shupeng Su, Zhisheng Zhong, and Chao Zhang. 2019. Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval. In ICCV. 3027--3035.Google Scholar
Dejie Yang, Dayan Wu, Wanqian Zhang, Haisu Zhang, Bo Li, Weiping Wang. Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval, ICMR 2020Google Scholar
Chao Li, Cheng Deng, Lei Wang, De Xie, and Xianglong Liu. Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval. In AAAI 2018.Google Scholar
Lin Wu, Yang Wang, and Ling Shao. Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Transactions on Image Processing, 28(4):1602--1612, 2018.Google ScholarDigital Library
Hengtong Hu, Lingxi Xie, Richang Hong, Qi Tian. Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing. In IEEE Conference on Computer Vision and Pattern Recognition, 2020Google ScholarCross Ref
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.Google Scholar

Index Terms

Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Hashing-based cross-modal search which aims to map multiple modality features into binary codes has attracted increasingly attention due to its storage and search efficiency especially in large-scale database retrieval. Recent unsupervised deep cross-...
Read More
Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks
Highlights
- MCGCN for the first time builds cross-modal graph and jointly learns modality-specific and modality-shared features for semi-supervised cross-modal hashing.
- MCGCN provides a three-channel network architecture, including two modality-...
Abstract
Cross-modal hashing maps heterogeneous multimedia data into Hamming space for retrieving relevant samples across modalities, which has received great research interests due to its rapid retrieval and low storage cost. In real-world applications, ...
Read More
Noise-robust Deep Cross-Modal Hashing
Highlights
- A noise-robust cross-modal hashing method NrCMH is proposed for data with noisy labels.
Abstract
Cross-modal hashing has been intensively studied to efficiently retrieve multi-modal data across modalities. Supervised cross-modal hashing methods leverage the labels of training data to improve the retrieval performance. However, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
August 2021
715 pages
ISBN:9781450384636
DOI:10.1145/3460426
General Chairs:
Wen-Huang Cheng
National Yang Ming Chiao Tung University, Taiwan
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Meng Wang
Hefei University of Technology, China
,
Program Chairs:
Wei-Ta Chu
National Cheng Kung University, Taiwan
,
Jiaying Liu
Peking University, China
,
Marcel Worring
University of Amsterdam, Netherlands
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cross-modal hashing
cross-modal retrieval
knowledge distillation
unsupervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 479
  Total Downloads
- Downloads (Last 12 months)103
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval

Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks

Noise-robust Deep Cross-Modal Hashing