skip to main content
research-article

Supervised Hierarchical Online Hashing for Cross-modal Retrieval

Published: 11 January 2024 Publication History

Abstract

Online cross-modal hashing has gained attention for its adaptability in processing streaming data. However, existing methods only define the hard similarity between data using labels. This results in poor retrieval performance, as they fail to exploit the semantic structure information of labels and miss the high-quality hash codes guided by the hierarchical relevance between labels. In addition, they ignore the bit-flipping problem, which leads to sub-optimal cross-modal retrieval performance. To address these issues, we propose Supervised Hierarchical Online Hashing (SHOH) for cross-modal retrieval. Our approach acquires hierarchical similarity via cross-layer affiliation of labels and explores its application to online hashing. We design a hierarchical similarity learning method in the online learning framework, which includes virtual center learning and hierarchical similarity embedding. Labels with soft similarity bridge the label hierarchy and cross-modal hash embedding. Furthermore, we propose a Weighted Retrieval Strategy (WRS) to mitigate the impact caused by bit-flipping errors. Extensive experiments and verification on hierarchical and non-hierarchical datasets demonstrate that SHOH preserves accurate inter-class distances and achieves performance improvements compared to state-of-the-art methods. The source code is available at https://github.com/HUST-IDSM-AI/SHOH.

References

[1]
Abdelhak Bentaleb, Ali C. Begen, and Roger Zimmermann. 2018. ORL-SDN: Online reinforcement learning for SDN-Enabled HTTP adaptive streaming. ACM Trans. Multim. Comput. Commun. Appl. 14, 3 (2018), 71:1–71:28. DOI:
[2]
Fatih Çakir, Kun He, Sarah Adel Bargal, and Stan Sclaroff. 2017. MIHash: Online hashing with mutual information. In IEEE International Conference on Computer Vision. IEEE Computer Society, 437–445. DOI:
[3]
Xixian Chen, Irwin King, and Michael R. Lyu. 2017. FROSH: FasteR online sketching hashing. In 33rd Conference on Uncertainty in Artificial Intelligence, Gal Elidan, Kristian Kersting, and Alexander Ihler (Eds.). AUAI Press. Retrieved from http://auai.org/uai2017/proceedings/papers/12.pdf
[4]
Yudong Chen, Sen Wang, Jianglin Lu, Zhi Chen, Zheng Zhang, and Zi Huang. 2021. Local graph convolutional networks for cross-modal hashing. In ACM Multimedia Conference, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo César, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 1921–1928. DOI:
[5]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from national university of Singapore. In 8th ACM International Conference on Image and Video Retrieval, Stéphane Marchand-Maillet and Yiannis Kompatsiaris (Eds.). ACM. DOI:
[6]
Roxane Desrousseaux, Gilles Bernard, and Jean-Jacques Mariage. 2021. Predicting financial suspicious activity reports with online learning methods. In IEEE International Conference on Big Data, Yixin Chen, Heiko Ludwig, Yicheng Tu, Usama M. Fayyad, Xingquan Zhu, Xiaohua Hu, Suren Byna, Xiong Liu, Jianping Zhang, Shirui Pan, Vagelis Papalexakis, Jianwu Wang, Alfredo Cuzzocrea, and Carlos Ordonez (Eds.). IEEE, 1595–1603. DOI:
[7]
Chen-Lu Ding, Xin Luo, Xiao-Ming Wu, Yu-Wei Zhan, Rui Li, Hui Zhang, and Xin-Shun Xu. 2022. Weakly-supervised online hashing with refined pseudo tags. In 31st ACM International Conference on Information & Knowledge Management, Mohammad Al Hasan and Li Xiong (Eds.). ACM, 375–385. DOI:
[8]
Arindam Jati and Dimitra Emmanouilidou. 2020. Supervised deep hashing for efficient audio event retrieval. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 4497–4501. DOI:
[9]
Sheng Jin, Qin Zhou, Hongxun Yao, Yao Liu, and Xian-Sheng Hua. 2021. Asynchronous teacher guided bit-wise hard mining for online hashing. In 35th AAAI Conference on Artificial Intelligence, 33rd Conference on Innovative Applications of Artificial Intelligence, 11th Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 1717–1724. DOI:
[10]
Theodoros Karagkioules, Georgios S. Paschos, Nikolaos Liakopoulos, Attilio Fiandrotti, Dimitrios Tsilimantos, and Marco Cagnazzo. 2022. Online learning for adaptive video streaming in mobile networks. ACM Trans. Multim. Comput. Commun. Appl. 18, 1 (2022), 2:1–2:22. DOI:
[11]
Cong Leng, Jiaxiang Wu, Jian Cheng, Xiao Bai, and Hanqing Lu. 2015. Online sketching hashing. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2503–2511. DOI:
[12]
Shuyan Li, Xiu Li, Jiwen Lu, and Jie Zhou. 2021. Self-supervised video hashing via bidirectional transformers. In IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation/IEEE, 13549–13558. DOI:
[13]
Haitao Lin, Min Meng, and Jigang Wu. 2022. Online robust specific and consistent hashing. In IEEE International Conference on Multimedia and Expo. IEEE, 1–6. DOI:
[14]
Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Shen Chen, and Qi Tian. 2020. Hadamard matrix guided online hashing. Int. J. Comput. Vis. 128, 8 (2020), 2279–2306. DOI:
[15]
Mingbao Lin, Rongrong Ji, Xiaoshuai Sun, Baochang Zhang, Feiyue Huang, Yonghong Tian, and Dacheng Tao. 2022. Fast class-wise updating for online hashing. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5 (2022), 2453–2467. DOI:
[16]
Shiguang Liu and Ziqing Huang. 2020. Efficient image hashing with geometric invariant vector distance for copy detection. ACM Trans. Multim. Comput. Commun. Appl. 15, 4 (2020), 106:1–106:22. DOI:
[17]
Xin Liu, Zhikai Hu, Haibin Ling, and Yiu-Ming Cheung. 2021. MTFH: A matrix tri-factorization hashing framework for efficient cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3 (2021), 964–981. DOI:
[18]
Xin Liu, Jinhan Yi, Yiu-ming Cheung, Xing Xu, and Zhen Cui. 2023. OMGH: Online manifold-guided hashing for flexible cross-modal retrieval. IEEE Trans. Multim. 25 (2023), 3811–3824. DOI:
[19]
Yu Liu, Yangtao Wang, Jingkuan Song, Chan Guo, Ke Zhou, and Zhili Xiao. 2020. Deep self-taught graph embedding hashing with pseudo labels for image retrieval. In IEEE International Conference on Multimedia and Expo. IEEE, 1–6. DOI:
[20]
Xu Lu, Lei Zhu, Zhiyong Cheng, Jingjing Li, Xiushan Nie, and Huaxiang Zhang. 2019. Flexible online multi-modal hashing for large-scale multimedia retrieval. In 27th ACM International Conference on Multimedia, Laurent Amsaleg, Benoit Huet, Martha A. Larson, Guillaume Gravier, Hayley Hung, Chong-Wah Ngo, and Wei Tsang Ooi (Eds.). ACM, 1129–1137. DOI:
[21]
George A. Miller. 1995. WordNet: A lexical database for English. Commun. ACM 38, 11 (1995), 39–41. DOI:
[22]
Daan Odijk and Anne Schuth. 2017. Online learning to rank for recommender systems. In 11th ACM Conference on Recommender Systems, Paolo Cremonesi, Francesco Ricci, Shlomo Berkovsky, and Alexander Tuzhilin (Eds.). ACM, 348. DOI:
[23]
Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3 (2001), 145–175. DOI:
[24]
Heng Tao Shen, Luchen Liu, Yang Yang, Xing Xu, Zi Huang, Fumin Shen, and Richang Hong. 2021. Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans. Knowl. Data Eng. 33, 10 (2021), 3351–3365. DOI:
[25]
Ling Shen, Richang Hong, Haoran Zhang, Xinmei Tian, and Meng Wang. 2020. Video retrieval with similarity-preserving deep temporal hashing. ACM Trans. Multim. Comput. Commun. Appl. 15, 4 (2020), 109:1–109:16. DOI:
[26]
Zhenqiu Shu, Li Li, Jun Yu, Donglin Zhang, Zhengtao Yu, and Xiaojun Wu. 2023. Online supervised collective matrix factorization hashing for cross-modal retrieval. Appl. Intell. 53, 11 (2023), 14201–14218. DOI:
[27]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, Yoshua Bengio and Yann LeCun (Eds.). Retrieved from http://arxiv.org/abs/1409.1556
[28]
Xuemeng Song, Fuli Feng, Xianjing Han, Xin Yang, Wei Liu, and Liqiang Nie. 2018. Neural compatibility modeling with attentive knowledge distillation. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Kevyn Collins-Thompson, Qiaozhu Mei, Brian D. Davison, Yiqun Liu, and Emine Yilmaz (Eds.). ACM, 5–14. DOI:
[29]
Changchang Sun, Xuemeng Song, Fuli Feng, Wayne Xin Zhao, Hao Zhang, and Liqiang Nie. 2019. Supervised hierarchical cross-modal hashing. In 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Benjamin Piwowarski, Max Chevalier, Éric Gaussier, Yoelle Maarek, Jian-Yun Nie, and Falk Scholer (Eds.). ACM, 725–734. DOI:
[30]
Dan Wang, Heyan Huang, Chi Lu, Bo-Si Feng, Guihua Wen, Liqiang Nie, and Xianling Mao. 2018. Supervised deep hashing for hierarchical labeled data. In 32nd AAAI Conference on Artificial Intelligence, 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 7388–7395. DOI:
[31]
Di Wang, Quan Wang, Yaqiang An, Xinbo Gao, and Yumin Tian. 2020. Online collective matrix factorization hashing for large-scale cross-media retrieval. In 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jimmy X. Huang, Yi Chang, Xueqi Cheng, Jaap Kamps, Vanessa Murdock, Ji-Rong Wen, and Yiqun Liu (Eds.). ACM, 1409–1418. DOI:
[32]
Di Wang, Caiping Zhang, Quan Wang, Yumin Tian, Lihuo He, and Lin Zhao. 2023. Hierarchical semantic structure preserving hashing for cross-modal retrieval. IEEE Trans. Multim. 25 (2023), 1217–1229. DOI:
[33]
Song Wang, Huan Zhao, and Keqin Li. 2022. Discrete joint semantic alignment hashing for cross-modal image-text search. IEEE Trans. Circ. Syst. Video Technol. 32, 11 (2022), 8022–8036. DOI:
[34]
Xiaoqin Wang, Chen Chen, Rushi Lan, Licheng Liu, Zhenbing Liu, Huiyu Zhou, and Xiaonan Luo. 2022. Binary representation via jointly personalized sparse hashing. ACM Trans. Multim. Comput. Commun. Appl. 18, 3s (2022), 137:1–137:20. DOI:
[35]
Yongxin Wang, Xin Luo, and Xin-Shun Xu. 2020. Label embedding online hashing for cross-modal retrieval. In 28th ACM International Conference on Multimedia, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 871–879. DOI:
[36]
Zhenyu Weng and Yuesheng Zhu. 2020. Online hashing with efficient updating of binary codes. In 34th AAAI Conference on Artificial Intelligence, 32nd Innovative Applications of Artificial Intelligence Conference, 10th AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 12354–12361. DOI:
[37]
Dayan Wu, Qi Dai, Bo Li, and Weiping Wang. 2023. Deep uncoupled discrete hashing via similarity matrix decomposition. ACM Trans. Multim. Comput. Commun. Appl. 19, 1 (2023), 22:1–22:22. DOI:
[38]
Xiao-Ming Wu, Xin Luo, Yu-Wei Zhan, Chen-Lu Ding, Zhen-Duo Chen, and Xin-Shun Xu. 2022. Online enhanced semantic hashing: Towards effective and efficient retrieval for streaming multi-modal data. In 36th AAAI Conference on Artificial Intelligence, 34th Conference on Innovative Applications of Artificial Intelligence, 12th Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 4263–4271. DOI:
[39]
Liang Xie, Jialie Shen, Jungong Han, Lei Zhu, and Ling Shao. 2017. Dynamic multi-view hashing for online image retrieval. In 26th International Joint Conference on Artificial Intelligence, Carles Sierra (Ed.). ijcai.org, 3133–3139. DOI:
[40]
Liang Xie, Jialie Shen, and Lei Zhu. 2016. Online cross-modal hashing for web image retrieval. In 30th AAAI Conference on Artificial Intelligence, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 294–300. DOI:
[41]
Yanzhao Xie, Yu Liu, Yangtao Wang, Lianli Gao, Peng Wang, and Ke Zhou. 2020. Label-attended hashing for multi-label image retrieval. In 29th International Joint Conference on Artificial Intelligence, Christian Bessiere (Ed.). ijcai.org, 955–962. DOI:
[42]
Cheng Yan, Xiao Bai, Shuai Wang, Jun Zhou, and Edwin R. Hancock. 2019. Cross-modal hashing with semantic deep embedding. Neurocomputing 337 (2019), 58–66. DOI:
[43]
Tao Yao, Gang Wang, Lianshan Yan, Xiangwei Kong, Qingtang Su, Caiming Zhang, and Qi Tian. 2019. Online latent semantic hashing for cross-media retrieval. Pattern Recog. 89 (2019), 1–11. DOI:
[44]
Zhaoda Ye and Yuxin Peng. 2020. Sequential cross-modal hashing learning via multi-scale correlation mining. ACM Trans. Multim. Comput. Commun. Appl. 15, 4 (2020), 105:1–105:20. DOI:
[45]
Jinhan Yi, Xin Liu, Yiu-ming Cheung, Xing Xu, Wentao Fan, and Yi He. 2021. Efficient online label consistent hashing for large-scale cross-modal retrieval. In IEEE International Conference on Multimedia and Expo. IEEE, 1–6. DOI:
[46]
En Yu, Jianhua Ma, Jiande Sun, Xiaojun Chang, Huaxiang Zhang, and Alexander G. Hauptmann. 2022. Deep discrete cross-modal hashing with multiple supervision. Neurocomputing 486 (2022), 215–224. DOI:
[47]
Yu-Wei Zhan, Xin Luo, Yongxin Wang, and Xin-Shun Xu. 2020. Supervised hierarchical deep hashing for cross-modal retrieval. In 28th ACM International Conference on Multimedia, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 3386–3394. DOI:
[48]
Yu-Wei Zhan, Yongxin Wang, Yu Sun, Xiao-Ming Wu, Xin Luo, and Xin-Shun Xu. 2022. Discrete online cross-modal hashing. Pattern Recog. 122 (2022), 108262. DOI:
[49]
Donglin Zhang, Xiaojun Wu, and Jun Yu. 2021. Label consistent flexible matrix factorization hashing for efficient cross-modal retrieval. ACM Trans. Multim. Comput. Commun. Appl. 17, 3 (2021), 90:1–90:18. DOI:
[50]
Jian Zhang and Yuxin Peng. 2018. Query-adaptive image retrieval by deep-weighted hashing. IEEE Trans. Multim. 20, 9 (2018), 2400–2414. DOI:
[51]
Zheng Zhang, Jianning Wang, Lei Zhu, and Guangming Lu. 2022. Discriminative visual similarity search with semantically cycle-consistent hashing networks. ACM Trans. Multim. Comput. Commun. Appl. 18, 2s (2022), 114:1–114:21. DOI:
[52]
Lei Zhu, Xu Lu, Zhiyong Cheng, Jingjing Li, and Huaxiang Zhang. 2020. Deep collaborative multi-view hashing for large-scale image search. IEEE Trans. Image Process. 29 (2020), 4643–4655. DOI:
[53]
Yaoxin Zhuo, Yikang Li, Jenhao Hsiao, Chiuman Ho, and Baoxin Li. 2022. CLIP4Hashing: Unsupervised deep hashing for cross-modal video-text retrieval. In International Conference on Multimedia Retrieval, Vincent Oria, Maria Luisa Sapino, Shin’ichi Satoh, Brigitte Kerhervé, Wen-Huang Cheng, Ichiro Ide, and Vivek K. Singh (Eds.). ACM, 158–166. DOI:

Cited By

View all
  • (2025)Parameter Adaptive Contrastive Hashing for multimedia retrievalNeural Networks10.1016/j.neunet.2024.106923182(106923)Online publication date: Feb-2025
  • (2025)Adaptive Asymmetric Supervised Cross-Modal Hashing with consensus matrixInformation Processing & Management10.1016/j.ipm.2024.10403762:3(104037)Online publication date: May-2025
  • (2025)Unsupervised Adaptive Hypergraph Correlation Hashing for multimedia retrievalInformation Processing & Management10.1016/j.ipm.2024.10395862:2(103958)Online publication date: Mar-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 4
April 2024
676 pages
EISSN:1551-6865
DOI:10.1145/3613617
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2024
Online AM: 13 November 2023
Accepted: 26 October 2023
Revised: 19 September 2023
Received: 17 May 2023
Published in TOMM Volume 20, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Online hashing
  2. cross-modal retrieval
  3. label hierarchy

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Natural Science Foundation of Hubei Province

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)485
  • Downloads (Last 6 weeks)42
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Parameter Adaptive Contrastive Hashing for multimedia retrievalNeural Networks10.1016/j.neunet.2024.106923182(106923)Online publication date: Feb-2025
  • (2025)Adaptive Asymmetric Supervised Cross-Modal Hashing with consensus matrixInformation Processing & Management10.1016/j.ipm.2024.10403762:3(104037)Online publication date: May-2025
  • (2025)Unsupervised Adaptive Hypergraph Correlation Hashing for multimedia retrievalInformation Processing & Management10.1016/j.ipm.2024.10395862:2(103958)Online publication date: Mar-2025
  • (2025)Unsupervised random walk manifold contrastive hashing for multimedia retrievalComplex & Intelligent Systems10.1007/s40747-025-01814-y11:4Online publication date: 28-Feb-2025
  • (2024)Privacy-Enhanced Prototype-Based Federated Cross-Modal Hashing for Cross-Modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367450720:9(1-19)Online publication date: 25-Jun-2024
  • (2024)Deep Lifelong Cross-Modal HashingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.345049034:12_Part_2(13478-13493)Online publication date: 1-Dec-2024
  • (2024)Unsupervised Online Cross-modal Hashing With Multiple Association Exploitation2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688371(1-6)Online publication date: 15-Jul-2024
  • (2024)Supervised Semantic-Embedded Hashing for Multimedia RetrievalKnowledge-Based Systems10.1016/j.knosys.2024.112023299(112023)Online publication date: Sep-2024
  • (2024)Category correlations embedded semantic centers hashing for cross-modal retrievalInformation Sciences10.1016/j.ins.2024.121262683(121262)Online publication date: Nov-2024
  • (undefined)Multi-scale Consistency Deep Lifelong Cross-modal HashingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3704636

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media