research-article

POLISH: Adaptive Online Cross-Modal Hashing for Class Incremental Data

Authors:

Xin-Shun XuAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 4470 - 4478

https://doi.org/10.1145/3589334.3645716

Published: 13 May 2024 Publication History

Abstract

In recent years, hashing-based online cross-modal retrieval has garnered growing attention. This trend is motivated by the fact that web data is increasingly delivered in a streaming manner as opposed to batch processing. Simultaneously, the sheer scale of web data sometimes makes it impractical to fully load for the training of hashing models. Despite the evolution of online cross-modal hashing techniques, several challenges remain: 1) Most existing methods learn hash codes by considering the relevance among newly arriving data or between new data and the existing data, often disregarding valuable global semantic information. 2) A common but limiting assumption in many methods is that the label space remains constant, implying that all class labels should be provided within the first data chunk. This assumption does not hold in real-world scenarios, and the presence of new labels in incoming data chunks can severely degrade or even break these methods.

To tackle these issues, we introduce a novel supervised online cross-modal hashing method named adaPtive Online cLass-Incremental haSHing (POLISH). Leveraging insights from language models, POLISH generates representations for new class label from multiple angles. Meanwhile, POLISH treats label embeddings, which remain unchanged once learned, as stable global information to produce high-quality hash codes. POLISH also puts forward an efficient optimization algorithm for hash code learning. Extensive experiments on two real-world benchmark datasets show the effectiveness of the proposed POLISH for class incremental data in the cross-modal hashing domain.

Supplemental Material

MP4 File

Video presentation

Download
1249.98 MB

MP4 File

Supplemental video

Download
59.65 MB

References

[1]

Tiago Carvalho, Edmar R. S. De Rezende, Matheus T. P. Alves, Fernanda K. C. Balieiro, and Ricardo B. Sovat. 2017. Exposing Computer Generated Images by Eye's Region Classification via Transfer Learning of VGG19 CNN. In Proceedings of the International Conference on Machine Learning and Applications. 866--870.

[2]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A Real-World Web Image Database from National University of Singapore. In Proceedings of ACM International Conference on Image and Video Retrieval.

Digital Library

[3]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[4]

Guiguang Ding, Yuchen Guo, and Jile Zhou. 2014. Collective Matrix Factorization Hashing for Multimodal Data. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 2083--2090.

Digital Library

[5]

Khoa D Doan and Chandan K Reddy. 2020. Efficient implicit unsupervised text hashing using adversarial autoencoder. In Proceedings of the Web Conference 2020. 684--694.

Digital Library

[6]

Jianwu Fang, Hongke Xu, Qi Wang, and Tianjun Wu. 2017. Online Hash Tracking with Spatio-Temporal Saliency Auxiliary. Computer Vision and Image Understanding, Vol. 160 (2017), 57--72.

[7]

Mengqiu Hu, Yang Yang, Fumin Shen, Ning Xie, Richang Hong, and Heng Tao Shen. 2019. Collective Reconstructive Embeddings for Cross-Modal Hashing. IEEE Transactions on Image Processing, Vol. 28, 6 (2019), 2770--2784.

[8]

Junfan Huang, Peipei Kang, Na Han, Yonghao Chen, Xiaozhao Fang, Hongbo Gao, and Guoxu Zhou. 2023. Two-stage Asymmetric Similarity Preserving Hashing for Cross-modal Retrieval. IEEE Transactions on Knowledge and Data Engineering (2023).

[9]

Long-Kai Huang, Qiang Yang, and Wei-Shi Zheng. 2013. Online Hashing. In Proceedings of the International Joint Conference on Artificial Intelligence. 1422--1428.

[10]

Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr Retrieval Evaluation. In Proceedings of the ACM International Conference on Multimedia Information Retrieval. 39--43.

[11]

Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep Cross-Modal Hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3270--3278.

[12]

Qing-Yuan Jiang and Wu-Jun Li. 2019. Discrete Latent Factor Model for Cross-Modal Hashing. IEEE Transactions on Image Processing, Vol. 28, 7 (2019), 3490--3501.

Digital Library

[13]

Xueting Jiang, Xin Liu, Yiu-Ming Cheung, Xing Xu, Shukai Zheng, and Taihao Li. 2023. Label-Semantic-Enhanced Online Hashing for Efficient Cross-modal Retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). 984--989.

[14]

Chuan-Xiang Li, Zhen-Duo Chen, Peng-Fei Zhang, Xin Luo, Liqiang Nie, Wei Zhang, and Xin-Shun Xu. 2018a. SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval. In Proceedings of the ACM International Conference on Multimedia. 1--9.

Digital Library

[15]

Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018b. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 4242--4251.

[16]

Li Li, Zhenqiu Shu, Zhengtao Yu, and Xiao-Jun Wu. 2023. Robust online hashing with label semantic enhancement for cross-modal retrieval. Pattern Recognition (2023), 109972.

[17]

Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Shen Chen, and Qi Tian. 2020. Hadamard Matrix Guided Online Hashing. International Journal of Computer Vision, Vol. 128, 8 (2020), 2279--2306.

Digital Library

[18]

Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Yongjian Wu, and Yunsheng Wu. 2019. Towards Optimal Discrete Online Hashing with Balanced Similarity. In Proceedings of the AAAI Conference on Artificial Intelligence. 8722--8729.

Digital Library

[19]

Mingbao Lin, Rongrong Ji, Hong Liu, and Yongjian Wu. 2018. Supervised Online Hashing via Hadamard Codebook Learning. In Proceedings of the ACM International Conference on Multimedia. 1635--1643.

Digital Library

[20]

Xin Liu, Jinhan Yi, Yiu-ming Cheung, Xing Xu, and Zhen Cui. 2022. Omgh: Online manifold-guided hashing for flexible cross-modal retrieval. IEEE Transactions on Multimedia (2022).

[21]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[22]

Mingsheng Long, Yue Cao, Jianmin Wang, and Philip S. Yu. 2016. Composite Correlation Quantization for Efficient Multimodal Retrieval. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 579--588.

[23]

Xin Luo, Liqiang Nie, Xiangnan He, Ye Wu, Zhen-Duo Chen, and Xin-Shun Xu. 2018a. Fast Scalable Supervised Hashing. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 735--744.

Digital Library

[24]

Xin Luo, Xiao-Ya Yin, Liqiang Nie, Xuemeng Song, Yongxin Wang, and Xin-Shun Xu. 2018b. SDMCH: Supervised Discrete Manifold-Embedded Cross-Modal Hashing. In Proceedings of the International Joint Conference on Artificial Intelligence. 2518--2524.

[25]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781 (2013).

[26]

Mengshi Qi, Yunhong Wang, and Annan Li. 2017. Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph. In Proceedings of the ACM International Conference on Multimedia. 744--752.

Digital Library

[27]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the International Conference on Machine Learning, Vol. 139. 8748--8763.

[28]

Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, Francc ois Yvon, Matthias Gallé, et al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).

[29]

Anshumali Shrivastava and Ping Li. 2015. Asymmetric minwise hashing for indexing binary inner products and set containment. In Proceedings of the Web Conference. 981--991.

Digital Library

[30]

Jinhui Tang, Zechao Li, Meng Wang, and Ruizhen Zhao. 2015. Neighborhood Discriminant Hashing for Large-Scale Image Retrieval. IEEE Transactions on Image Processing, Vol. 24, 9 (2015), 2827--2840.

Digital Library

[31]

Rong-Cheng Tu, Xian-Ling Mao, Jia-Nan Guo, Wei Wei, and Heyan Huang. 2021. Partial-softmax loss based deep hashing. In Proceedings of the Web Conference. 2869--2878.

Digital Library

[32]

Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2019. Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 10 (2019), 2466--2479.

Digital Library

[33]

Di Wang, Quan Wang, Yaqiang An, Xinbo Gao, and Yumin Tian. 2020b. Online Collective Matrix Factorization Hashing for Large-Scale Cross-Media Retrieval. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 1409--1418.

Digital Library

[34]

Yongxin Wang, Zhen-Duo Chen, Xin Luo, and Xin-Shun Xu. 2021. High-dimensional sparse cross-modal hashing with fine-grained similarity embedding. In Proceedings of the Web Conference. 2900--2909.

Digital Library

[35]

Yongxin Wang, Xin Luo, and Xin-Shun Xu. 2020a. Label Embedding Online Hashing for Cross-Modal Retrieval. In Proceedings of the ACM International Conference on Multimedia. 871--879.

Digital Library

[36]

Xinyu Xia, Guohua Dong, Fengling Li, Lei Zhu, and Xiaomin Ying. 2023. When CLIP meets cross-modal hashing retrieval: A new strong baseline. Information Fusion, Vol. 100 (2023), 101968.

Digital Library

[37]

Xin-Shun Xu. 2016. Dictionarylearning Based Hashing for Cross-Modal Retrieval. In Proceedings of the ACM International Conference on Multimedia. 177--181.

[38]

Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning Discriminative Binary Codes for Large-Scale Cross-Modal Retrieval. IEEE Transactions on Image Processing, Vol. 26, 5 (2017), 2494--2507.

Digital Library

[39]

Tao Yao, Gang Wang, Lianshan Yan, Xiangwei Kong, Qingtang Su, Caiming Zhang, and Qi Tian. 2019. Online Latent Semantic Hashing for Cross-Media Retrieval. Pattern Recognition, Vol. 89 (2019), 1--11.

[40]

Jinhan Yi, Xin Liu, Yiu-ming Cheung, Xing Xu, Wentao Fan, and Yi He. 2021. Efficient Online Label Consistent Hashing for Large-Scale Cross-Modal Retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1--6.

[41]

Heng Yu, Shuyan Ding, Lunbo Li, and Jiexin Wu. 2022. Self-Attentive CLIP Hashing for Unsupervised Cross-Modal Retrieval. In Proceedings of the ACM International Conference on Multimedia in Asia. 1--7.

Digital Library

[42]

Jun Yu, Xiao-Jun Wu, Donglin Zhang, and Josef Kittler. 2020. Adaptive Online Multi-Modal Hashing via Hadamard Matrix. CoRR, Vol. abs/2009.12148 (2020). https://arxiv.org/abs/2009.12148

[43]

Li Yuan, Tao Wang, Xiaopeng Zhang, Francis E. H. Tay, Zequn Jie, Wei Liu, and Jiashi Feng. 2020. Central Similarity Quantization for Efficient Image and Video Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3080--3089.

[44]

Yu-Wei Zhan, Yongxin Wang, Yu Sun, Xiao-Ming Wu, Xin Luo, and Xin-Shun Xu. 2022. Discrete online cross-modal hashing. Pattern Recognition, Vol. 122 (2022), 108262.

Digital Library

[45]

Donglin Zhang, Xiaojun Wu, and Jun Yu. 2021. Label Consistent Flexible Matrix Factorization Hashing for Efficient Cross-modal Retrieval. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 17, 3 (2021), 90:1--90:18.

Digital Library

[46]

Xuefei Zhe, Shifeng Chen, and Hong Yan. 2020. Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-Wise Loss. IEEE Transactions on Neural Networks and Learning System, Vol. 31, 5 (2020), 1681--1695.

[47]

Yaoxin Zhuo, Yikang Li, Jenhao Hsiao, Chiuman Ho, and Baoxin Li. 2022. Clip4hashing: unsupervised deep hashing for cross-modal video-text retrieval. In Proceedings of the international conference on multimedia retrieval. 158--166. io

Digital Library

Cited By

Liu YFu QJi SFang X(2025)Supervised online multi-modal discrete hashingSignal Processing10.1016/j.sigpro.2024.109872231(109872)Online publication date: Jun-2025
https://doi.org/10.1016/j.sigpro.2024.109872
Wang TLi FZhu LLi JZhang ZShen H(2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
https://doi.org/10.1109/JPROC.2024.3525147
Fan WYang CLuo KZhang MLi H(2024)Category correlations embedded semantic centers hashing for cross-modal retrievalInformation Sciences10.1016/j.ins.2024.121262683(121262)Online publication date: Nov-2024
https://doi.org/10.1016/j.ins.2024.121262

Index Terms

POLISH: Adaptive Online Cross-Modal Hashing for Class Incremental Data
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Similarity measures
    2. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Supervised Hierarchical Online Hashing for Cross-modal Retrieval
Online cross-modal hashing has gained attention for its adaptability in processing streaming data. However, existing methods only define the hard similarity between data using labels. This results in poor retrieval performance, as they fail to exploit the ...
ONION: Online Semantic Autoencoder Hashing for Cross-Modal Retrieval
Cross-modal hashing (CMH) has recently received increasing attention with the merit of speed and storage in performing large-scale cross-media similarity search. However, most existing cross-media approaches utilize the batch-based mode to update hash ...
Online supervised collective matrix factorization hashing for cross-modal retrieval
Abstract
Recently, online hashing has received extensive attention in cross-modal retrieval since it can effectively deal with large-scale streaming data. However, some of them in online cross-modal retrieval still have the following limitations: (1) They ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Young Scholars Program of Shandong University
Natural Science Foundation of Shandong Province
the National Natural Science Foundation of China

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
175
Total Downloads

Downloads (Last 12 months)175
Downloads (Last 6 weeks)14

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu YFu QJi SFang X(2025)Supervised online multi-modal discrete hashingSignal Processing10.1016/j.sigpro.2024.109872231(109872)Online publication date: Jun-2025
https://doi.org/10.1016/j.sigpro.2024.109872
Wang TLi FZhu LLi JZhang ZShen H(2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
https://doi.org/10.1109/JPROC.2024.3525147
Fan WYang CLuo KZhang MLi H(2024)Category correlations embedded semantic centers hashing for cross-modal retrievalInformation Sciences10.1016/j.ins.2024.121262683(121262)Online publication date: Nov-2024
https://doi.org/10.1016/j.ins.2024.121262

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten