Abstract
Cross-modal hashing is an effective and practical way for large-scale multimedia retrieval. Unsupervised hashing, which is a strong candidate for cross-modal hashing, has received more attention due to its easy unlabeled data collection. However, although there has been a rich line of such work in academia, they are hindered by a common disadvantage that the training data must exist in pairs to connect different modalities (e.g., a pair of an image and a text, which have the same semantic information), namely, the learning cannot perform with no pair-wise information available. To overcome this limitation, we explore to design a Completely Unsupervised Cross-Modal Hashing (CUCMH) approach with none but numeric features available, i.e., with neither class labels nor pair-wise information. To the best of our knowledge, this is the first work discussing this issue, for which, a novel dual-branch generative adversarial network is proposed. We also introduce the concept that the representation of multimedia data can be separated into content and style manner. The modality representation codes are employed to improve the effectiveness of the generative adversarial learning. Extensive experiments demonstrate the outperformance of CUCMH in completely unsupervised cross-modal hashing tasks and the effectiveness of the method integrating modality representation with semantic information in representation learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrew, G., Arora, R., Bilmes, J.A., Livescu, K.: Deep canonical correlation analysis. In: ICML (2013)
Bronstein, M.M., Bronstein, A.M., Michel, F., Paragios, N.: Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: CVPR (2010)
Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: CIVR (2009)
Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: CVPR (2014)
Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: NeurIPS (2013)
Fu, Z., Tan, X., Peng, N., Zhao, D., Yan, R.: Style transfer in text: exploration and evaluation. In: AAAI (2018)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR (2016)
Goodfellow, I.J., et al.: Generative adversarial nets. In: NeurIPS (2014)
Hardoon, D.R., Szedmák, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Hu, Y., Jin, Z., Ren, H., Cai, D., He, X.: Iterative multi-view hashing for cross media indexing. In: ACMMM (2014)
Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: SIGMM (2008)
Jiang, Q.Y., Li, W.J.: Deep cross-modal hashing. In: CVPR (2017)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)
Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. In: IJCAI (2011)
Li, C., Deng, C., Wang, L., Xie, D., Liu, X.: Coupled cyclegan: unsupervised hashing network for cross-modal retrieval. In: AAAI (2019)
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: ACMMM (2003)
Lin, Z., Ding, G., Hu, M., Wang, J.: Semantics-preserving hashing for cross-view retrieval. In: CVPR (2015)
Liu, W., Mu, C., Kumar, S., Chang, S.: Discrete graph hashing. In: NeurIPS (2014)
Long, M., Cao, Y., Wang, J., Yu, P.S.: Composite correlation quantization for efficient multimodal retrieval. In: SIGIR (2016)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)
Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: ACMMM (2010)
Rastegari, M., Choi, J., Fakhraei, S., Hal III, H., Davis, L.S.: Predictable dual-view hashing. In: ICML (2013)
Rosipal, R., Krämer, N.: Overview and recent advances in partial least squares. In: SLSFS (2005)
Sharma, A., Kumar, A., Daumé, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: CVPR (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: SIGMOD (2013)
Sun, L., Ji, S., Ye, J.: A least squares formulation for canonical correlation analysis. In: ICML (2008)
Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12, 1247–1283 (2000)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NeurIPS (2008)
Wu, G., et al.: Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI (2018)
Wu, L., Wang, Y., Shao, L.: Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans. Image Process. 28(4), 1602–1612 (2019)
Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Process. 26, 2494–2507 (2017)
Ye, Z., Peng, Y.: Multi-scale correlation for sequential cross-modal hashing learning. In: ACMMM (2018)
Zhang, D., Li, W.J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI (2014)
Zhang, J., Peng, Y., Yuan, M.: Unsupervised generative adversarial cross-modal hashing. In: AAAI (2018)
Zhen, Y., Yeung, D.: Co-regularized hashing for multimodal data. In: NeurIPS (2012)
Zhou, J., Ding, G., Guo, Y.: Latent semantic sparse hashing for cross-modal similarity search. In: SIGIR (2014)
Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Acknowledgement
This work was partially supported by Australian Research Council Discovery Project (ARC DP190102353).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Duan, J., Zhang, P., Huang, Z. (2020). Completely Unsupervised Cross-Modal Hashing. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12112. Springer, Cham. https://doi.org/10.1007/978-3-030-59410-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-59410-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59409-1
Online ISBN: 978-3-030-59410-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)