skip to main content
10.1145/3589334.3645716acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

POLISH: Adaptive Online Cross-Modal Hashing for Class Incremental Data

Published: 13 May 2024 Publication History

Abstract

In recent years, hashing-based online cross-modal retrieval has garnered growing attention. This trend is motivated by the fact that web data is increasingly delivered in a streaming manner as opposed to batch processing. Simultaneously, the sheer scale of web data sometimes makes it impractical to fully load for the training of hashing models. Despite the evolution of online cross-modal hashing techniques, several challenges remain: 1) Most existing methods learn hash codes by considering the relevance among newly arriving data or between new data and the existing data, often disregarding valuable global semantic information. 2) A common but limiting assumption in many methods is that the label space remains constant, implying that all class labels should be provided within the first data chunk. This assumption does not hold in real-world scenarios, and the presence of new labels in incoming data chunks can severely degrade or even break these methods.
To tackle these issues, we introduce a novel supervised online cross-modal hashing method named adaPtive Online cLass-Incremental haSHing (POLISH). Leveraging insights from language models, POLISH generates representations for new class label from multiple angles. Meanwhile, POLISH treats label embeddings, which remain unchanged once learned, as stable global information to produce high-quality hash codes. POLISH also puts forward an efficient optimization algorithm for hash code learning. Extensive experiments on two real-world benchmark datasets show the effectiveness of the proposed POLISH for class incremental data in the cross-modal hashing domain.

Supplemental Material

MP4 File
Video presentation
MP4 File
Supplemental video

References

[1]
Tiago Carvalho, Edmar R. S. De Rezende, Matheus T. P. Alves, Fernanda K. C. Balieiro, and Ricardo B. Sovat. 2017. Exposing Computer Generated Images by Eye's Region Classification via Transfer Learning of VGG19 CNN. In Proceedings of the International Conference on Machine Learning and Applications. 866--870.
[2]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A Real-World Web Image Database from National University of Singapore. In Proceedings of ACM International Conference on Image and Video Retrieval.
[3]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[4]
Guiguang Ding, Yuchen Guo, and Jile Zhou. 2014. Collective Matrix Factorization Hashing for Multimodal Data. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 2083--2090.
[5]
Khoa D Doan and Chandan K Reddy. 2020. Efficient implicit unsupervised text hashing using adversarial autoencoder. In Proceedings of the Web Conference 2020. 684--694.
[6]
Jianwu Fang, Hongke Xu, Qi Wang, and Tianjun Wu. 2017. Online Hash Tracking with Spatio-Temporal Saliency Auxiliary. Computer Vision and Image Understanding, Vol. 160 (2017), 57--72.
[7]
Mengqiu Hu, Yang Yang, Fumin Shen, Ning Xie, Richang Hong, and Heng Tao Shen. 2019. Collective Reconstructive Embeddings for Cross-Modal Hashing. IEEE Transactions on Image Processing, Vol. 28, 6 (2019), 2770--2784.
[8]
Junfan Huang, Peipei Kang, Na Han, Yonghao Chen, Xiaozhao Fang, Hongbo Gao, and Guoxu Zhou. 2023. Two-stage Asymmetric Similarity Preserving Hashing for Cross-modal Retrieval. IEEE Transactions on Knowledge and Data Engineering (2023).
[9]
Long-Kai Huang, Qiang Yang, and Wei-Shi Zheng. 2013. Online Hashing. In Proceedings of the International Joint Conference on Artificial Intelligence. 1422--1428.
[10]
Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr Retrieval Evaluation. In Proceedings of the ACM International Conference on Multimedia Information Retrieval. 39--43.
[11]
Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep Cross-Modal Hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3270--3278.
[12]
Qing-Yuan Jiang and Wu-Jun Li. 2019. Discrete Latent Factor Model for Cross-Modal Hashing. IEEE Transactions on Image Processing, Vol. 28, 7 (2019), 3490--3501.
[13]
Xueting Jiang, Xin Liu, Yiu-Ming Cheung, Xing Xu, Shukai Zheng, and Taihao Li. 2023. Label-Semantic-Enhanced Online Hashing for Efficient Cross-modal Retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). 984--989.
[14]
Chuan-Xiang Li, Zhen-Duo Chen, Peng-Fei Zhang, Xin Luo, Liqiang Nie, Wei Zhang, and Xin-Shun Xu. 2018a. SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval. In Proceedings of the ACM International Conference on Multimedia. 1--9.
[15]
Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018b. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 4242--4251.
[16]
Li Li, Zhenqiu Shu, Zhengtao Yu, and Xiao-Jun Wu. 2023. Robust online hashing with label semantic enhancement for cross-modal retrieval. Pattern Recognition (2023), 109972.
[17]
Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Shen Chen, and Qi Tian. 2020. Hadamard Matrix Guided Online Hashing. International Journal of Computer Vision, Vol. 128, 8 (2020), 2279--2306.
[18]
Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Yongjian Wu, and Yunsheng Wu. 2019. Towards Optimal Discrete Online Hashing with Balanced Similarity. In Proceedings of the AAAI Conference on Artificial Intelligence. 8722--8729.
[19]
Mingbao Lin, Rongrong Ji, Hong Liu, and Yongjian Wu. 2018. Supervised Online Hashing via Hadamard Codebook Learning. In Proceedings of the ACM International Conference on Multimedia. 1635--1643.
[20]
Xin Liu, Jinhan Yi, Yiu-ming Cheung, Xing Xu, and Zhen Cui. 2022. Omgh: Online manifold-guided hashing for flexible cross-modal retrieval. IEEE Transactions on Multimedia (2022).
[21]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[22]
Mingsheng Long, Yue Cao, Jianmin Wang, and Philip S. Yu. 2016. Composite Correlation Quantization for Efficient Multimodal Retrieval. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 579--588.
[23]
Xin Luo, Liqiang Nie, Xiangnan He, Ye Wu, Zhen-Duo Chen, and Xin-Shun Xu. 2018a. Fast Scalable Supervised Hashing. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 735--744.
[24]
Xin Luo, Xiao-Ya Yin, Liqiang Nie, Xuemeng Song, Yongxin Wang, and Xin-Shun Xu. 2018b. SDMCH: Supervised Discrete Manifold-Embedded Cross-Modal Hashing. In Proceedings of the International Joint Conference on Artificial Intelligence. 2518--2524.
[25]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781 (2013).
[26]
Mengshi Qi, Yunhong Wang, and Annan Li. 2017. Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph. In Proceedings of the ACM International Conference on Multimedia. 744--752.
[27]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the International Conference on Machine Learning, Vol. 139. 8748--8763.
[28]
Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, Francc ois Yvon, Matthias Gallé, et al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).
[29]
Anshumali Shrivastava and Ping Li. 2015. Asymmetric minwise hashing for indexing binary inner products and set containment. In Proceedings of the Web Conference. 981--991.
[30]
Jinhui Tang, Zechao Li, Meng Wang, and Ruizhen Zhao. 2015. Neighborhood Discriminant Hashing for Large-Scale Image Retrieval. IEEE Transactions on Image Processing, Vol. 24, 9 (2015), 2827--2840.
[31]
Rong-Cheng Tu, Xian-Ling Mao, Jia-Nan Guo, Wei Wei, and Heyan Huang. 2021. Partial-softmax loss based deep hashing. In Proceedings of the Web Conference. 2869--2878.
[32]
Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2019. Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 10 (2019), 2466--2479.
[33]
Di Wang, Quan Wang, Yaqiang An, Xinbo Gao, and Yumin Tian. 2020b. Online Collective Matrix Factorization Hashing for Large-Scale Cross-Media Retrieval. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 1409--1418.
[34]
Yongxin Wang, Zhen-Duo Chen, Xin Luo, and Xin-Shun Xu. 2021. High-dimensional sparse cross-modal hashing with fine-grained similarity embedding. In Proceedings of the Web Conference. 2900--2909.
[35]
Yongxin Wang, Xin Luo, and Xin-Shun Xu. 2020a. Label Embedding Online Hashing for Cross-Modal Retrieval. In Proceedings of the ACM International Conference on Multimedia. 871--879.
[36]
Xinyu Xia, Guohua Dong, Fengling Li, Lei Zhu, and Xiaomin Ying. 2023. When CLIP meets cross-modal hashing retrieval: A new strong baseline. Information Fusion, Vol. 100 (2023), 101968.
[37]
Xin-Shun Xu. 2016. Dictionarylearning Based Hashing for Cross-Modal Retrieval. In Proceedings of the ACM International Conference on Multimedia. 177--181.
[38]
Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning Discriminative Binary Codes for Large-Scale Cross-Modal Retrieval. IEEE Transactions on Image Processing, Vol. 26, 5 (2017), 2494--2507.
[39]
Tao Yao, Gang Wang, Lianshan Yan, Xiangwei Kong, Qingtang Su, Caiming Zhang, and Qi Tian. 2019. Online Latent Semantic Hashing for Cross-Media Retrieval. Pattern Recognition, Vol. 89 (2019), 1--11.
[40]
Jinhan Yi, Xin Liu, Yiu-ming Cheung, Xing Xu, Wentao Fan, and Yi He. 2021. Efficient Online Label Consistent Hashing for Large-Scale Cross-Modal Retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo. 1--6.
[41]
Heng Yu, Shuyan Ding, Lunbo Li, and Jiexin Wu. 2022. Self-Attentive CLIP Hashing for Unsupervised Cross-Modal Retrieval. In Proceedings of the ACM International Conference on Multimedia in Asia. 1--7.
[42]
Jun Yu, Xiao-Jun Wu, Donglin Zhang, and Josef Kittler. 2020. Adaptive Online Multi-Modal Hashing via Hadamard Matrix. CoRR, Vol. abs/2009.12148 (2020). https://arxiv.org/abs/2009.12148
[43]
Li Yuan, Tao Wang, Xiaopeng Zhang, Francis E. H. Tay, Zequn Jie, Wei Liu, and Jiashi Feng. 2020. Central Similarity Quantization for Efficient Image and Video Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3080--3089.
[44]
Yu-Wei Zhan, Yongxin Wang, Yu Sun, Xiao-Ming Wu, Xin Luo, and Xin-Shun Xu. 2022. Discrete online cross-modal hashing. Pattern Recognition, Vol. 122 (2022), 108262.
[45]
Donglin Zhang, Xiaojun Wu, and Jun Yu. 2021. Label Consistent Flexible Matrix Factorization Hashing for Efficient Cross-modal Retrieval. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 17, 3 (2021), 90:1--90:18.
[46]
Xuefei Zhe, Shifeng Chen, and Hong Yan. 2020. Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-Wise Loss. IEEE Transactions on Neural Networks and Learning System, Vol. 31, 5 (2020), 1681--1695.
[47]
Yaoxin Zhuo, Yikang Li, Jenhao Hsiao, Chiuman Ho, and Baoxin Li. 2022. Clip4hashing: unsupervised deep hashing for cross-modal video-text retrieval. In Proceedings of the international conference on multimedia retrieval. 158--166. io

Cited By

View all
  • (2025)Supervised online multi-modal discrete hashingSignal Processing10.1016/j.sigpro.2024.109872231(109872)Online publication date: Jun-2025
  • (2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
  • (2024)Category correlations embedded semantic centers hashing for cross-modal retrievalInformation Sciences10.1016/j.ins.2024.121262683(121262)Online publication date: Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '24: Proceedings of the ACM Web Conference 2024
May 2024
4826 pages
ISBN:9798400701719
DOI:10.1145/3589334
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-modal retrieval
  2. learning to hash
  3. online hashing

Qualifiers

  • Research-article

Funding Sources

  • Young Scholars Program of Shandong University
  • Natural Science Foundation of Shandong Province
  • the National Natural Science Foundation of China

Conference

WWW '24
Sponsor:
WWW '24: The ACM Web Conference 2024
May 13 - 17, 2024
Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)175
  • Downloads (Last 6 weeks)14
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Supervised online multi-modal discrete hashingSignal Processing10.1016/j.sigpro.2024.109872231(109872)Online publication date: Jun-2025
  • (2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
  • (2024)Category correlations embedded semantic centers hashing for cross-modal retrievalInformation Sciences10.1016/j.ins.2024.121262683(121262)Online publication date: Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media