ABSTRACT
Recently, multimodal hashing techniques have received considerable attention due to their low storage cost and fast query speed for multimodal data retrieval. Many methods have been proposed; however, there are still some problems that need to be further considered. For example, some of these methods just use a similarity matrix for learning hash functions which will discard some useful information contained in original data; some of them relax binary constraints or separate the process of learning hash functions and binary codes into two independent stages to bypass the obstacle of handling the discrete constraints on binary codes for optimization, which may generate large quantization error; some of them are not robust to noise. All these problems may degrade the performance of a model. To consider these problems, in this paper, we propose a novel supervised hashing framework for cross-modal retrieval, i.e., Supervised Robust Discrete Multimodal Hashing (SRDMH). Specifically, SRDMH tries to make final binary codes preserve label information as same as that in original data so that it can leverage more label information to supervise the binary codes learning. In addition, it learns hashing functions and binary codes directly instead of relaxing the binary constraints so as to avoid large quantization error problem. Moreover, to make it robust and easy to solve, we further integrate a flexible l2,p loss with nonlinear kernel embedding and an intermediate presentation of each instance. Finally, an alternating algorithm is proposed to solve the optimization problem in SRDMH. Extensive experiments are conducted on three benchmark data sets. The results demonstrate that the proposed method (SRDMH) outperforms or is comparable to several state-of-the-art methods for cross-modal retrieval task.
- A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117--122, 2008. Google ScholarDigital Library
- A. Andoni and I. P. Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In STOC, pages 793--801, 2015. Google ScholarDigital Library
- J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975. Google ScholarDigital Library
- M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601, 2010.Google ScholarCross Ref
- T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. NUS-WIDE: a real-world web image database from national university of singapore. In CIVR, 2009. Google ScholarDigital Library
- G. Ding, Y. Guo, and J. Zhou. Collective matrix factorization hashing for multimodal data. In CVPR, pages 2083--2090, 2014. Google ScholarDigital Library
- T. Do, A. Doan, and N. Cheung. Discrete hashing with deep neural network. CoRR, abs/1508.07148, 2015.Google Scholar
- J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3(3):209--26, 1977. Google ScholarDigital Library
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999. Google ScholarDigital Library
- Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, pages 817--824, 2011. Google ScholarDigital Library
- M. J. Huiskes and M. S. Lew. The MIR flickr retrieval evaluation. In MIR, pages 39--43, 2008. Google ScholarDigital Library
- B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. In NIPS, pages 1042--1050, 2009. Google ScholarDigital Library
- B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, pages 2130--2137, 2009.Google ScholarCross Ref
- S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In IJCAI, pages 1360--1365, 2011. Google ScholarDigital Library
- H. Lee, A. Battle, R. Raina, and A. Y. Ng. Efficient sparse coding algorithms. In NIPS, pages 801--808, 2006. Google ScholarDigital Library
- R.-S. Lin, D. A. Ross, and J. Yagnik. Spec hashing: Similarity preserving algorithm for entropy-based coding. In CVPR, pages 848--854, 2010.Google ScholarCross Ref
- Z. Lin, G. Ding, M. Hu, and J. Wang. Semantics-preserving hashing for cross-view retrieval. In CVPR, pages 3864--3872, 2015.Google ScholarCross Ref
- W. Liu, J. Wang, R. Ji, Y. Jiang, and S. Chang. Supervised hashing with kernels. In CVPR, pages 2074--2081, 2012. Google ScholarDigital Library
- Y. Liu, J. Cui, Z. Huang, H. Li, and H. T. Shen. SKLSH: An efficient index structure for spproximate nearest neighbor search. In VLDB, pages 745--756, 2014. Google ScholarDigital Library
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004. Google ScholarDigital Library
- S. M. Omohundro. Efficient algorithms with neural network behavior. Complex Systems, 1(2):273--347, 1987.Google Scholar
- J. C. Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):521--535, 2014. Google ScholarDigital Library
- G. Shakhnarovich. Learning task-specific similarity. PhD thesis, MIT, 2005. Google ScholarDigital Library
- F. Shen, C. Shen, W. Liu, and H. T. Shen. Supervised discrete hashing. In CVPR, pages 37--45, 2015.Google ScholarCross Ref
- C. Silpa-Anan and R. Hartley. Optimised kd-trees for fast image descriptor matching. In CVPR, pages 1--8, 2008.Google ScholarCross Ref
- J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In MM, pages 423--432, 2011. Google ScholarDigital Library
- J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD, pages 785--796, 2013. Google ScholarDigital Library
- F. Ture, T. Elsayed, and J. Lin. No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity. In SIGIR, pages 943--952, 2011. Google ScholarDigital Library
- J. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40(4):175--179, 1991.Google ScholarCross Ref
- D. Wang, X. Gao, X. Wang, and L. He. Semantic topic multimodal hashing for cross-media retrieval. In IJCAI, pages 3890--3896, 2015. Google ScholarDigital Library
- J. Wang, O. Kumar, and S. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, pages 3424--3431, 2010.Google ScholarCross Ref
- J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning for hashing with compact codes. In ICML, pages 1127--1134, 2010.Google ScholarDigital Library
- J. Wang, X.-S. Xu, S. Guo, L. Cui, and X. Wang. Linear unsupervised hashing for ann search in euclidean space. Neurocomputing, 171(c):283--292, 2016. Google ScholarDigital Library
- S.-S. Wang, Z. Huang, and X.-S. Xu. A multi-label least-squares hashing for scalable image search. In SDM, pages 954--962, 2015.Google ScholarCross Ref
- Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS 21, pages 1753--1760, 2009. Google ScholarDigital Library
- H. Xu, J. Wang, Z. Li, and G. Zeng. Complementary hashing for approximate nearest neighbor search. In ICCV, pages 1631--1638, 2011. Google ScholarDigital Library
- Y. Yang, Z. Ma, Y. Yang, F. Nie, and H. T. Shen. Multitask spectral clustering by exploring intertask correlation. IEEE Transactions on Cybernetics, 45(5):1069--1080, 2015.Google ScholarCross Ref
- Y. Yang, Z. Zha, Y. Gao, X. Zhu, and T. Chua. Corrections to "exploiting web images for semantic video indexing via robust sample-specific loss". IEEE Transactions on Multimedia, 17(2):256, 2015.Google ScholarDigital Library
- D. Zhang and W. Li. Large-scale supervised multimodal hashing with semantic correlation maximization. In AAAI, pages 2177--2183, 2014. Google ScholarDigital Library
- D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, pages 225--234, 2011. Google ScholarDigital Library
- Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In NIPS, pages 1385--1393, 2012. Google ScholarDigital Library
- Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In KDD, pages 940--948, 2012. Google ScholarDigital Library
- J. Zhou, G. Ding, and Y. Guo. Latent semantic sparse hashing for cross-modal similarity search. In SIGIR, pages 415--424, 2014. Google ScholarDigital Library
- X. Zhu, Z. Huang, H. T. Shen, and X. Zhao. Linear cross-modal hashing for efficient multimedia search. In MM, pages 143--152, 2013. Google ScholarDigital Library
- F. Zou, C. Liu, H. Ling, H. Feng, L. Yan, and D. Li. Least square regularized spectral hashing for similarity search. Signal Processing, 93(8):2265--2273, 2013. Google ScholarDigital Library
Index Terms
- Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval
Recommendations
Asymmetric Discrete Cross-Modal Hashing
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia RetrievalRecently, cross-modal hashing (CMH) methods have attracted much attention. Many methods have been explored; however, there are still some issues that need to be further considered. 1) How to efficiently construct the correlations among heterogeneous ...
Semi-Relaxation Supervised Hashing for Cross-Modal Retrieval
MM '17: Proceedings of the 25th ACM international conference on MultimediaRecently, some cross-modal hashing methods have been devised for cross-modal search task. Essentially, given a similarity matrix, most of these methods tackle a discrete optimization problem by separating it into two stages, i.e., first relaxing the ...
Supervised Hierarchical Deep Hashing for Cross-Modal Retrieval
MM '20: Proceedings of the 28th ACM International Conference on MultimediaCross-modal hashing has attracted much attention in the large-scale multimedia search area. In many real applications, labels of samples have hierarchical structure which also contains much useful information for learning. However, most existing methods ...
Comments