ABSTRACT
Cross-modal hashing (CMH) maps heterogeneous multiple modality data into compact binary code to achieve fast and flexible retrieval across different modalities, especially in large-scale retrieval. As the data don't need a lot of manual annotation, unsupervised cross-modal hashing has a wider application prospect than supervised method. However, the existing unsupervised methods are difficult to achieve satisfactory performance due to the lack of credible supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised Knowledge Distillation Cross-Modal Hashing method (KDCMH), which can use similarity information distilled from unsupervised method to guide supervised method. Specifically, firstly, the teacher model adopted an unsupervised distribution-based similarity hashing method, which can construct a modal fusion similarity matrix.Secondly, under the supervision of teacher model distillation information, student model can generate more discriminative hash codes. In two public datasets NUS-WIDE and MIRFLICKR-25K, extensive experiments have proved the significant improvement of KDCMH on several representative unsupervised cross-modal hashing methods.
- Michael M Bronstein, Alexander M Bronstein, Fabrice Michel, and Nikos Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3594--3601. IEEE, 2010.Google Scholar
- Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S Yu. Deep visual-semantic hashing for cross-modal retrieval. In SIGKDD, pages 1445--1454, 2016.Google ScholarDigital Library
- Guiguang Ding, Yuchen Guo, and Jile Zhou. Collective matrix factorization hashing for multimodal data. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pages 2075--2082.Google ScholarDigital Library
- Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing, 27(8):3893--3903, 2018.Google ScholarCross Ref
- Fangxiang Feng, Xiaojie Wang, and Ruifan Li. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the 22nd ACM international conference on Multimedia, pages 7--16. ACM, 2014.Google ScholarDigital Library
- Qing-Yuan Jiang and Wu-Jun Li. Deep cross-modal hashing. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3232--3240, 2017.Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep convolutional neural networks. In NeurIPS, pages 1097--1105, 2012.Google ScholarDigital Library
- Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 4242--4251,2018.Google ScholarCross Ref
- Shaishav Kumar and Raghavendra Udupa. Learning hash functions for cross-view similarity search. In Twenty-Second International Joint Conference on Artificial Intelligence, 2011.Google Scholar
- Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. Semantics-preserving hashing for cross-view retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3864--3872, 2015.Google ScholarCross Ref
- Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang, and Baochang Zhang. Cross-modality binary code learning via fusion similarity hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7380--7388, 2017.Google ScholarCross Ref
- Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD, pages 785--796, 2013.Google ScholarDigital Library
- Botong Wu, Qiang Yang, Wei-Shi Zheng, Yizhou Wang, and Jingdong Wang. Quantized correlation hashing for fast cross-modal search. In IJCAI, 2015.Google ScholarDigital Library
- Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In ECCV, pages 818--833, 2014.Google ScholarCross Ref
- Ting Zhang and Jingdong Wang. Collaborative quantization for cross-modal similarity search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2036--2045, 2016.Google ScholarCross Ref
- Xi Zhang, Hanjiang Lai, and Jiashi Feng. Attention-aware deep adversarial hashing for cross-modal retrieval. In Proceedings of the European Conference on Computer Vision (ECCV), pages 591--606, 2018.Google ScholarDigital Library
- Song Liu,Shengsheng Qian,Yang Guan,Jiawei Zhan and Long Ying. Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval.In SIGIR, July 25--30, 2020, p1379--1388.Google Scholar
- B. Wang, Y. Yang, X. Xu, A. Hanjalic, and H. T. Shen. Adversarial cross-modal retrieval. In ACM MM, 2017.Google ScholarDigital Library
- Jian Zhang, Yuxin Peng, and Mingkuan Yuan. Unsupervised generative adversarial cross-modal hashing. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google Scholar
- Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. NUS-WIDE: A real-world web image database from national University of Singapore. In ICIVR, pages 48--56, 2009.Google Scholar
- Mark J Huiskes and Michael SLew. The MIRFlickr retrieval evaluation. In ICMIR, pages 39--43, 2008.Google Scholar
- Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, and Larry Davis. Predictable dual-view hashing. In International Conference on Machine Learning, pages 1328--1336, 2013.Google Scholar
- Guiguang Ding, Yuchen Guo, Jile Zhou, and Yue Gao. Large-scale cross-modality search via collective matrix factorization hashing. IEEE Transactions on Image Processing, 25(11):5427--5440, 2016.Google ScholarDigital Library
- Mingsheng Long, Yue Cao, Jianmin Wang, and Philip S Yu. Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 579--588. ACM, 2016.Google ScholarDigital Library
- Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4133--4141, 2017.Google ScholarCross Ref
- Mingyong Li and Hongya Wang.Deep Semantic Adversarial Hashing Based on Autoencoder for Large-Scale Cross-Modal Retrieval.2020 IEEE International Conference on Multimedia Expo Workshops (ICMEW), 2020, pp. 1--6.Google Scholar
- Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. 2015. Semantics preserving hashing for cross-view retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 3864--3872.Google ScholarCross Ref
- Zhaoda Ye and Yuxin Peng.Multi-Scale Correlation for Sequential Cross-modal Hashing Learning.In ACMMM 2018. 852--860.Google Scholar
- Dongqing Zhang and Wu-Jun Li. 2014. Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization. In AAAI. 2177--2183.Google Scholar
- Jile Zhou, Guiguang Ding, and Yuchen Guo. 2014. Latent semantic sparse hashing for cross-modal similarity search. In SIGIR. ACM, 415--424.Google Scholar
- Jun Yu, Hao Zhou, Yibing Zhan, Dacheng Tao. Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing. In AAAI 2021.Google Scholar
- Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, and Jialie Shen. 2018. Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval. In IJCAI. 2854--2860.Google Scholar
- Shupeng Su, Zhisheng Zhong, and Chao Zhang. 2019. Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval. In ICCV. 3027--3035.Google Scholar
- Dejie Yang, Dayan Wu, Wanqian Zhang, Haisu Zhang, Bo Li, Weiping Wang. Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval, ICMR 2020Google Scholar
- Chao Li, Cheng Deng, Lei Wang, De Xie, and Xianglong Liu. Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval. In AAAI 2018.Google Scholar
- Lin Wu, Yang Wang, and Ling Shao. Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Transactions on Image Processing, 28(4):1602--1612, 2018.Google ScholarDigital Library
- Hengtong Hu, Lingxi Xie, Richang Hong, Qi Tian. Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing. In IEEE Conference on Computer Vision and Pattern Recognition, 2020Google ScholarCross Ref
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.Google Scholar
Index Terms
- Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval
Recommendations
Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information RetrievalHashing-based cross-modal search which aims to map multiple modality features into binary codes has attracted increasingly attention due to its storage and search efficiency especially in large-scale database retrieval. Recent unsupervised deep cross-...
Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks
Highlights- MCGCN for the first time builds cross-modal graph and jointly learns modality-specific and modality-shared features for semi-supervised cross-modal hashing.
- MCGCN provides a three-channel network architecture, including two modality-...
AbstractCross-modal hashing maps heterogeneous multimedia data into Hamming space for retrieving relevant samples across modalities, which has received great research interests due to its rapid retrieval and low storage cost. In real-world applications, ...
Noise-robust Deep Cross-Modal Hashing
Highlights- A noise-robust cross-modal hashing method NrCMH is proposed for data with noisy labels.
AbstractCross-modal hashing has been intensively studied to efficiently retrieve multi-modal data across modalities. Supervised cross-modal hashing methods leverage the labels of training data to improve the retrieval performance. However, ...
Comments