Abstract
Cross-modal hashing for large-scale approximate neighbor search has attracted great attention recently because of its significant computational and storage efficiency. However, it is still challenging to generate high-quality binary codes to preserve inter-modal and intra-modal semantics, especially in a semi-supervised manner. In this paper, we propose a semi-supervised cross-modal discrete code learning framework. This is the very first work of applying asymmetric graph convolutional networks (GCNs) for scalable cross-modal retrieval. Specifically, the architecture contains multiple GCN branches, each of which is for one data modality to extract modality-specific features and then to generate unified binary hash codes across different modalities, so that the underlying correlations and similarities across modalities are simultaneously preserved into the hash values. Moreover, the branches are built with asymmetric graph convolutional layers, which employ randomly sampled anchors to tackle the scalability and out-of-sample issue in graph learning, and reduce the complexity of cross-modal similarity calculation. Extensive experiments conducted on benchmark datasets demonstrate that our method can achieve superior retrieval performance in comparison with the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bronstein, M.M., Bronstein, A.M., Michel, F., Paragios, N.: Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: CVPR, pp. 3594–3601 (2010)
Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: CIVR, pp. 368–375 (2009)
Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: CVPR, pp. 2083–2090. IEEE Computer Society (2014)
Hardoon, D.R., Szedmák, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Hu, Y., Jin, Z., Ren, H., Cai, D., He, X.: Iterative multi-view hashing for cross media indexing. In: ACMMM, pp. 527–536 (2014)
Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: SIGMM, pp. 39–43 (2008)
Jiang, Q.Y., Li, W.J.: Deep cross-modal hashing. In: CVPR, pp. 3270–3278 (2017)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. In: IJCAI, pp. 1360–1365 (2011)
Lin, Z., Ding, G., Hu, M., Wang, J.: Semantics-preserving hashing for cross-view retrieval. In: CVPR, pp. 3864–3872 (2015)
Liu, W., Mu, C., Kumar, S., Chang, S.: Discrete graph hashing. In: NeurIPS, pp. 3419–3427 (2014)
Long, M., Cao, Y., Wang, J., Yu, P.S.: Composite correlation quantization for efficient multimodal retrieval. In: SIGIR, pp. 579–588 (2016)
Luo, Y., Yang, Y., Shen, F., Huang, Z., Zhou, P., Shen, H.T.: Robust discrete code modeling for supervised hashing. Pattern Recogn. 75, 128–135 (2018)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML, pp. 807–814 (2010)
Rastegari, M., Choi, J., Fakhraei, S., Hal, D., Davis, L.S.: Predictable dual-view hashing. In: ICML, pp. 1328–1336 (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Sun, L., Ji, S., Ye, J.: A least squares formulation for canonical correlation analysis. In: ICML (2008)
Wang, D., Cui, P., Ou, M., Zhu, W.: Learning compact hash codes for multimodal representations using orthogonal deep structure. IEEE Trans. Multimed. 17(9), 1404–1416 (2015)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NeurIPS, pp. 1753–1760 (2008)
Wu, G., et al.: Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp. 2854–2860 (2018)
Wu, L., Wang, Y., Shao, L.: Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans. Image Process. 28(4), 1602–1612 (2019)
Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Process. 26, 2494–2507 (2017)
Ye, Z., Peng, Y.: Multi-scale correlation for sequential cross-modal hashing learning. In: ACMMM, pp. 852–860 (2018)
Zhang, D., Li, W.J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI, pp. 2177–2183 (2014)
Zhang, J., Peng, Y., Yuan, M.: Unsupervised generative adversarial cross-modal hashing. In: AAAI, pp. 539–546 (2018)
Zhen, Y., Yeung, D.: Co-regularized hashing for multimodal data. In: NeuIPS, pp. 1385–1393 (2012)
Zhou, J., Ding, G., Guo, Y.: Latent semantic sparse hashing for cross-modal similarity search. In: SIGIR, pp. 415–424 (2014)
Zhou, X., et al.: Graph convolutional network hashing. IEEE Trans. Cybern. 1–13 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Duan, J., Luo, Y., Wang, Z., Huang, Z. (2020). Semi-supervised Cross-Modal Hashing with Graph Convolutional Networks. In: Borovica-Gajic, R., Qi, J., Wang, W. (eds) Databases Theory and Applications. ADC 2020. Lecture Notes in Computer Science(), vol 12008. Springer, Cham. https://doi.org/10.1007/978-3-030-39469-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-39469-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39468-4
Online ISBN: 978-3-030-39469-1
eBook Packages: Computer ScienceComputer Science (R0)