research-article

Label Consistent Flexible Matrix Factorization Hashing for Efficient Cross-modal Retrieval

Authors:

Jun YuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 3

Article No.: 90, Pages 1 - 18

https://doi.org/10.1145/3446774

Published: 22 July 2021 Publication History

Abstract

Hashing methods have sparked a great revolution on large-scale cross-media search due to its effectiveness and efficiency. Most existing approaches learn unified hash representation in a common Hamming space to represent all multimodal data. However, the unified hash codes may not characterize the cross-modal data discriminatively, because the data may vary greatly due to its different dimensionalities, physical properties, and statistical information. In addition, most existing supervised cross-modal algorithms preserve the similarity relationship by constructing an n×n pairwise similarity matrix, which requires a large amount of calculation and loses the category information. To mitigate these issues, a novel cross-media hashing approach is proposed in this article, dubbed label flexible matrix factorization hashing (LFMH). Specifically, LFMH jointly learns the modality-specific latent subspace with similar semantic by the flexible matrix factorization. In addition, LFMH guides the hash learning by utilizing the semantic labels directly instead of the large n×n pairwise similarity matrix. LFMH transforms the heterogeneous data into modality-specific latent semantic representation. Therefore, we can obtain the hash codes by quantifying the representations, and the learned hash codes are consistent with the supervised labels of multimodal data. Then, we can obtain the similar binary codes of the corresponding modality, and the binary codes can characterize such samples flexibly. Accordingly, the derived hash codes have more discriminative power for single-modal and cross-modal retrieval tasks. Extensive experiments on eight different databases demonstrate that our model outperforms some competitive approaches.

References

[1]

Ricardo Baeza-Yates, Berthier Ribeiro-Neto et al. 1999. Modern Information Retrieval. Vol. 463. ACM Press, New York, NY.

Digital Library

[2]

Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S. Yu. 2016. Deep visual-semantic hashing for cross-modal retrieval. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1445–1454.

Digital Library

[3]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 1–9.

Digital Library

[4]

Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Symposium on Computational Geometry. 253–262.

Digital Library

[5]

Guiguang Ding, Yuchen Guo, and Jile Zhou. 2014. Collective matrix factorization hashing for multimodal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2075–2082.

Digital Library

[6]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 2 (2010), 303–338.

Digital Library

[7]

Yunchao Gong, Qifa Ke, Michael Isard, and Svetlana Lazebnik. 2014. A multi-view embedding space for modeling internet images, tags, and their semantics. Int. J. Comput. Vis. 106, 2 (2014), 210–233.

Digital Library

[8]

Mengqiu Hu, Yang Yang, Fumin Shen, Ning Xie, Richang Hong, and Heng Tao Shen. 2019. Collective reconstructive embeddings for cross-modal hashing. IEEE Trans. Image Proc. 28, 6 (2019), 2770–2784.

[9]

Mark J. Huiskes and Michael S. Lew. 2008. The MIR Flickr retrieval evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. 39–43.

Digital Library

[10]

Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3232–3240.

[11]

Shaishav Kumar and Raghavendra Udupa. 2011. Learning hash functions for cross-view similarity search. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence.

Digital Library

[12]

Zijia Lin, Guiguang Ding, Jungong Han, and Jianmin Wang. 2016. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. Cyber. 47, 12 (2016), 4342–4355.

[13]

Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang, and Baochang Zhang. 2017. Cross-modality binary code learning via fusion similarity hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7380–7388.

[14]

Luchen Liu, Yang Yang, Mengqiu Hu, Xing Xu, Fumin Shen, Ning Xie, and Zi Huang. 2018. Index and retrieve multimedia data: Cross-modal hashing by learning subspace relation. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 606–621.

Digital Library

[15]

Shaowei Liu, Peng Cui, Huanbo Luan, Wenwu Zhu, Shiqiang Yang, and Qi Tian. 2014. Social-oriented visual image search. Comput. Vis. Image Underst. 118 (2014), 30–39.

Digital Library

[16]

Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012. Supervised hashing with kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2074–2081.

Digital Library

[17]

Xin Liu, An Li, Ji-Xiang Du, Shu-Juan Peng, and Wentao Fan. 2018. Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing. Multim. Tools Applic. 77, 21 (2018), 28665–28683.

Digital Library

[18]

Devraj Mandal, Kunal N. Chaudhury, and Soma Biswas. 2017. Generalized semantic preserving hashing for n-label cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4076–4084.

[19]

Jonathan Masci, Michael M. Bronstein, Alexander M. Bronstein, and Jürgen Schmidhuber. 2013. Multimodal similarity-preserving hashing. IEEE Trans. Pattern Anal. Mach. Intel. 36, 4 (2013), 824–830.

Digital Library

[20]

Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using Amazon’s Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Association for Computational Linguistics, 139–147.

Digital Library

[21]

Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In Proceedings of the 18th ACM International Conference on Multimedia. 251–260.

Digital Library

[22]

Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, and Larry Davis. 2013. Predictable dual-view hashing. In Proceedings of the International Conference on Machine Learning. 1328–1336.

Digital Library

[23]

Jan Rupnik and John Shawe-Taylor. 2010. Multi-view canonical correlation analysis. In Proceedings of the Conference on Data Mining and Data Warehouses (SiKDD’10). 1–4.

[24]

Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 1-3 (2008), 157–173.

Digital Library

[25]

Alexander K. Seewald. 2005. Digits—A Dataset for Handwritten Digit Recognition. Technical Report. Austrian Research Institut for Artificial Intelligence, Vienna, Austria.

[26]

Abhishek Sharma, Abhishek Kumar, Hal Daume, and David W. Jacobs. 2012. Generalized multiview analysis: A discriminative latent space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2160–2167.

Digital Library

[27]

Guan Lin Shen and Xiao-Jun Wu. 2013. Content-based image retrieval by combining color, texture, and CENTRIST. In Proceedings of the Constantinides International Workshop on Signal Processing.

[28]

Xin Shu and Xiao-Jun Wu. 2011. A novel contour descriptor for 2D shape matching and its application to image retrieval. Image Vis. Comput. 29, 4 (2011), 286–294.

Digital Library

[29]

Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. 2013. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 785–796.

Digital Library

[30]

Jun Tang, Ke Wang, and Ling Shao. 2016. Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans. Image Proc. 25, 7 (2016), 3157–3166.

Digital Library

[31]

Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017. Adversarial cross-modal retrieval. In Proceedings of the 25th ACM International Conference on Multimedia. 154–162.

Digital Library

[32]

Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2018. Label consistent matrix factorization hashing for large-scale cross-modal similarity search. IEEE Trans. Pattern Anal. Mach. Intell. 41, 10 (2018), 2466–2479.

Digital Library

[33]

Di Wang, Xinbo Gao, Xiumei Wang, Lihuo He, and Bo Yuan. 2016. Multimodal discriminative binary embedding for large-scale cross-modal retrieval. IEEE Trans. Image Proc. 25, 10 (2016), 4540–4554.

Digital Library

[34]

Di Wang, Quan Wang, and Xinbo Gao. 2017. Robust and flexible discrete hashing for cross-modal similarity search. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (2017), 2703–2715.

Digital Library

[35]

Fei Wang, Peng Cui, Gordon Sun, Tat-Seng Chua, and Shiqiang Yang. 2012. Guest editorial: Special issue on information retrieval for social media. Inf. Retr. 15, 3–4 (2012), 179–182.

Digital Library

[36]

Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. 2014. Hashing for similarity search: A survey. arXiv preprint arXiv:1408.2927 (2014).

[37]

Yunchao Wei, Yao Zhao, Canyi Lu, Shikui Wei, Luoqi Liu, Zhenfeng Zhu, and Shuicheng Yan. 2016. Cross-modal retrieval with CNN visual features: A new baseline. IEEE Trans. Cyber. 47, 2 (2016), 449–460.

[38]

Fei Wu, Zhou Yu, Yi Yang, Siliang Tang, Yin Zhang, and Yueting Zhuang. 2013. Sparse multi-modal hashing. IEEE Trans. Multim. 16, 2 (2013), 427–439.

Digital Library

[39]

Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Proc. 26, 5 (2017), 2494–2507.

Digital Library

[40]

Xing Xu, Tan Wang, Yang Yang, Lin Zuo, Fumin Shen, and Heng Tao Shen. 2020. Cross-Modal attention with semantic consistence for image--text matching. IEEE Trans. Neural Netw. Learn. Syst. 31, 12 (2020), 5412--5425.

[41]

Zhou Yu, Fei Wu, Yi Yang, Qi Tian, Jiebo Luo, and Yueting Zhuang. 2014. Discriminative coupled dictionary hashing for fast cross-media retrieval. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 395–404.

Digital Library

[42]

Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the 28th AAAI Conference on Artificial Intelligence.

Digital Library

[43]

Fangming Zhong, Zhikui Chen, and Geyong Min. 2018. Deep discrete cross-modal hashing for cross-media retrieval. Pattern Recog. 83 (2018), 64–77.

[44]

Jile Zhou, Guiguang Ding, and Yuchen Guo. 2014. Latent semantic sparse hashing for cross-modal similarity search. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 415–424.

Digital Library

[45]

Xiaofeng Zhu, Zi Huang, Heng Tao Shen, and Xin Zhao. 2013. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the 21st ACM International Conference on Multimedia. 143–152.

Digital Library

Cited By

Yang FHu HMa FDing XZhang QLiu X(2025)Online semantic embedding correlation for discrete cross-media hashingExpert Systems with Applications10.1016/j.eswa.2025.126758272(126758)Online publication date: May-2025
https://doi.org/10.1016/j.eswa.2025.126758
Chen JYe W(2025)Roaen: reversed dependency graph and orthogonal-gating strategy attention-enhanced network for aspect-level sentiment classificationThe Journal of Supercomputing10.1007/s11227-024-06542-781:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11227-024-06542-7
Li ZLu HFu HMeng FGu G(2025)Csan: cross-coupled semantic adversarial network for cross-modal retrievalArtificial Intelligence Review10.1007/s10462-025-11152-758:5Online publication date: 1-Mar-2025
https://doi.org/10.1007/s10462-025-11152-7
Show More Cited By

Index Terms

Label Consistent Flexible Matrix Factorization Hashing for Efficient Cross-modal Retrieval
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Supervised Discriminative Discrete Hashing for Cross-Modal Retrieval
Advanced Data Mining and Applications
Abstract
With the growing interest in cross-modal retrieval technology, cross-modal hashing has become a mainstream trend for comparing and searching between different modalities. However, when faced with multi-label information, existing research has ... $^{}$ $^{}$
A semantic-consistency asymmetric matrix factorization hashing method for cross-modal retrieval
Abstract
Hashing methods have recently received widespread attention due to their flexibility and effectiveness for cross-modal retrieval tasks. However, existing cross-modal hashing methods have a common challenging problem, how to effectively exploit ...
Robust supervised matrix factorization hashing with application to cross-modal retrieval
Abstract
In recent years, hashing methods have received extensive attention in multimedia search due to their high computational and storage efficiency. However, most of them explore the common representation of multi-modality data and then use it to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 3

August 2021

443 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3476118

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2021

Accepted: 01 January 2021

Revised: 01 August 2020

Received: 01 July 2019

Published in TOMM Volume 17, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
111 Project of Chinese Ministry of Education

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

53
Total Citations
View Citations
365
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang FHu HMa FDing XZhang QLiu X(2025)Online semantic embedding correlation for discrete cross-media hashingExpert Systems with Applications10.1016/j.eswa.2025.126758272(126758)Online publication date: May-2025
https://doi.org/10.1016/j.eswa.2025.126758
Chen JYe W(2025)Roaen: reversed dependency graph and orthogonal-gating strategy attention-enhanced network for aspect-level sentiment classificationThe Journal of Supercomputing10.1007/s11227-024-06542-781:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s11227-024-06542-7
Li ZLu HFu HMeng FGu G(2025)Csan: cross-coupled semantic adversarial network for cross-modal retrievalArtificial Intelligence Review10.1007/s10462-025-11152-758:5Online publication date: 1-Mar-2025
https://doi.org/10.1007/s10462-025-11152-7
Liu YRen YLiu MLi HGuo HMiao XHu XChen HMa XWon Y(2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650702
Liu XLi JNie XZhang XYin Y(2024)Fast Unsupervised Cross-Modal Hashing with Robust Factorization and Dual ProjectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369468420:12(1-21)Online publication date: 26-Nov-2024
https://dl.acm.org/doi/10.1145/3694684
Zuo RZheng CLi FZhu LZhang Z(2024)Privacy-Enhanced Prototype-based Federated Cross-modal Hashing for Cross-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674507Online publication date: 25-Jun-2024
https://doi.org/10.1145/3674507
Kang XLiu XXue WNie XYin Y(2024)Online Cross-modal Hashing With Dynamic PrototypeACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366524920:8(1-18)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3665249
Han KLiu YWei RZhou KXu JLong K(2024)Supervised Hierarchical Online Hashing for Cross-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363252720:4(1-23)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3632527
Liao PWang XAn LMao SZhao TYang C(2024)TFSemantic: A Time–Frequency Semantic GAN Framework for Imbalanced Classification Using Radio SignalsACM Transactions on Sensor Networks10.1145/361409620:4(1-22)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3614096
Zhan YLuo XChen ZWang YWei YXu XChua TNgo CKa-Wei Lee RKumar RLauw H(2024)POLISH: Adaptive Online Cross-Modal Hashing for Class Incremental DataProceedings of the ACM Web Conference 202410.1145/3589334.3645716(4470-4478)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645716
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents