Abstract
Cross-modal hashing can effectively solve the large-scale cross-modal retrieval by integrating the advantages of traditional cross-modal analysis and hashing techniques. In cross-modal hashing, preserving semantic correlation is important and challenging. However, current hashing methods cannot well preserve the semantic correlation in hash codes. Supervised hashing requires labeled data which is difficult to obtain, and unsupervised hashing cannot effectively learn semantic correlation from multi-modal data. In order to effectively learn semantic correlation to improve hashing performance, we propose a novel approach: Semi-Supervised Semantic Factorization Hashing (S3FH), for large-scale cross-modal retrieval. The main purpose of S3FH is to improve semantic labels and factorize it into hash codes. It optimizes a joint framework which consists of three interactive parts, including semantic factorization, multi-graph learning and multi-modal correlation. Then, an efficient alternating algorithm is derived for optimizing S3FH. Extensive experiments on two real world multi-modal datasets demonstrate the effectiveness of S3FH.



Similar content being viewed by others
References
Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: CVPR. IEEE, pp 3594–3601
Cheng J, Leng C, Li P, Wang M, Lu H (2014) Semi-supervised multi-graph hashing for scalable similarity search. Comput Vis Image Underst 124:12–21
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from National University of Singapore. In: CIVR. ACM, p 48
Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(3):521–535
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: CVPR. IEEE, pp 2083–2090
Gan J, Feng J, Fang Q, Ng W (2012) Locality-sensitive hashing scheme based on dynamic collision counting. In: SIGMOD. ACM, pp 541–552
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12):2916–2929
Guillaumin M, Verbeek J, Schmid C (2010) Multimodal semi-supervised learning for image classification. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 902–909
Kan M, Xu D, Shan S, Chen X (2014) Semisupervised hashing via kernel hyperplane learning for scalable image search. IEEE Transactions on Circuits and Systems for Video Technology 24(4):704–713
Kong W, Li WJ (2012) Isotropic hashing. In: NIPS. pp 1646–1654
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: IJCAI, vol. 22. pp 1360
Liu W, He J, Chang SF (2010) Large graph construction for scalable semi-supervised learning. In: ICML. pp 679–686
Luo Y, Tao D, Geng B, Xu C, Maybank SJ (2013) Manifold regularized multitask learning for semi-supervised multilabel image classification. IEEE Trans Image Process 22(2):523–536
Murty KG, Yu FT (1988) Linear complementarity, linear and nonlinear programming. Citeseer
Pan Y, Yao T, Li H, Ngo CW, Mei T (2015) Semi-supervised hashing with semantic confidence for large scale visual search. In: The international ACM SIGIR conference. pp 53–62
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: ACM multimedia. ACM, pp 251–260
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5):1299–1319
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: SIGMOD. ACM, pp 785–796
Wang M, Hua XS, Hong R, Tang J, Qi GJ, Song Y (2009) Unified video annotation via multigraph learning. IEEE Transactions on Circuits and Systems for Video Technology 19(5):733–746
Wang J, Kumar S, Chang SF (2010) Semi-supervised hashing for scalable image retrieval. In: IEEE conference on computer vision and pattern recognition. pp 3424–3431
Wang J, Kumar S, Chang SF (2012a) Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (12):2393–2406
Wang M, Li H, Tao D, Lu K, Wu X (2012b) Multimodal graph-based reranking for web image search. IEEE Trans Image Process 21(11):4649–4661
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: NIPS. pp 1753–1760
Xie L, Zhu L, Pan P, Lu Y (2016) Cross-modal self-taught hashing for large-scale image retrieval. Signal Process 124:81–92
Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: ACM multimedia. ACM, pp 175–184
Zhai X, Peng Y, Xiao J (2013) Heterogeneous metric learning with joint graph regularization for cross-media retrieval. In: AAAI
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI
Zhang D, Wang J, Cai D, Lu J (2010) Self-taught hashing for fast similarity search. In: SIGIR. ACM, pp 18–25
Zhang D, Wang F, Si L (2011) Composite hashing with multiple information sources. In: SIGIR. ACM, pp 225–234
Zhao S, Yao H, Yang Y, Zhang Y (2014) Affective image retrieval via multi-graph learning. pp 1025–1028
Zhen Y, Yeung DY (2012) A probabilistic model for multimodal hash function learning. In: SIGKDD. ACM, pp 940–948
Zhu X, Lafferty J, Rosenfeld R (2005) Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University, Language Technologies Institute, School of Computer Science
Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: ACM multimedia. ACM, pp 143–152
Zhu L, Shen J, Jin H, Zheng R, Xie L (2015) Content-based visual landmark search via multimodal hypergraph learning. IEEE Transactions on Cybernetics 45 (12):2756–2769
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, J., Li, G., Pan, P. et al. Semi-supervised semantic factorization hashing for fast cross-modal retrieval. Multimed Tools Appl 76, 20197–20215 (2017). https://doi.org/10.1007/s11042-017-4567-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4567-3