Text-assisted attention-based cross-modal hashing

Yuan, Xiang; Shan, Shihao; Huo, Yuwen; Jiang, Junkai; Wu, Song

doi:10.1007/s13735-023-00311-7

Text-assisted attention-based cross-modal hashing

Regular Paper
Published: 08 January 2024

Volume 13, article number 3, (2024)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Xiang Yuan¹,
Shihao Shan¹,
Yuwen Huo¹,
Junkai Jiang² &
…
Song Wu¹

223 Accesses
Explore all metrics

Abstract

As one of the hottest research topics in multimedia information retrieval, cross-modal hashing has drawn widespread attention in the past decades. How to minimize the semantic gap of heterogeneous data and accurately calculate the similarity of cross-modal data is a key challenge for this task. A paradigm for tackling this problem is to map features of multi-modal data into common space. However, these approaches lack inter-modal information interaction and may not achieve satisfactory results. To overcome this problem, we propose a novel text-assisted attention-based cross-modal hashing (TAACH) method in this paper. Firstly, TAACH relies on LabelNet supervision to guide the learning of hash functions for each modality. In addition, a novel text-assisted attention mechanism is designed in our TAACH to densely integrate text features into image features, perceiving their spatial correlation and enhancing the consistency of image and text knowledge. Extensive experiments on four benchmark datasets show the effectiveness of our proposed TAACH, and it also achieves competitive performance compared to state-of-the-art methods. The source code is available at https://github.com/SWU-CS-MediaLab/TAACH.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-attention based semantic deep hashing for cross-modal retrieval

Article 20 January 2021

A novel deep translated attention hashing for cross-modal retrieval

Article 28 March 2022

Global and Local Feature Based Deep Cross-Modal Hashing

Data availibility

The MIRFLICKR-25K dataset is included in this article [32]. The NUS-WIDE dataset is included in [33]. The Microsoft COCO2014 dataset is included in [19]. The IAPR TC-12 dataset is included in [34].

References

Peng Y, Huang X, Zhao Y (2018) An overview of cross-media retrieval: concepts, methodologies, benchmarks, and challenges. IEEE Trans Circuits Syst Video Technol 28(9):2372–2385
Article Google Scholar
Wang K, Yin Q, Wang W, Wu S, Wang L (2016) A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215
Ding G, Guo Y, Zhou J, Gao Y (2016) Large-scale cross-modality search via collective matrix factorization hashing. IEEE Trans Image Process 25(11):5427–5440
Article MathSciNet Google Scholar
Ding K, Fan B, Huo C, Xiang S, Pan C (2016) Cross-modal hashing via rank-order preserving. IEEE Trans Multimed 19(3):571–585
Article Google Scholar
Gu J, Cai J, Joty SR, Niu L, Wang G (2018) Look, imagine and match: improving textual-visual cross-modal retrieval with generative models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7181–7189
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Twenty-second international joint conference on artificial intelligence
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp 415–424
Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 3594–3601
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872
Wang D, Gao X, Wang X, He L (2015) Semantic topic multimodal hashing for cross-media retrieval. In: Twenty-fourth international joint conference on artificial intelligence
Fei W, Zhou Yu, Yang Y, Tang S, Zhang Y, Zhuang Y (2013) Sparse multi-modal hashing. IEEE Trans Multimed 16(2):427–439
Google Scholar
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence, vol 28
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Cao Y, Long M, Wang J, Yang Q, Yu PS (2016) Deep visual-semantic hashing for cross-modal retrieval. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1445–1454
Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C (2014) Microsoft coco: Common objects in context. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, 6–12 Sept, 2014, Proceedings, Part V 13. Springer, pp 740–755
Shen Y, Liu L, Shao L, Song J (2017) Deep binaries: encoding semantic-rich cues for efficient textual-visual cross retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 4097–4106
Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251
Ma X, Zhang T, Changsheng X (2020) Multi-level correlation adversarial hashing for cross-modal retrieval. IEEE Trans Multimed 22(12):3101–3114
Article Google Scholar
Wang J, Zhang T, Sebe N, Shen HT (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach intel 40(4):769–790
Article Google Scholar
Wang X, Zou X, Bakker EM, Song W (2020) Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval. Neurocomputing 400:255–271
Article Google Scholar
Zou X, Wu S, Zhang N, Bakker EM (2022) Multi-label modality enhanced attention based self-supervised deep cross-modal hashing. Knowl Based Syst 239:107927
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1480–1489
Zhang X, Lai H, Feng J (2018) Attention-aware deep adversarial hashing for cross-modal retrieval. In: Proceedings of the European conference on computer vision (ECCV), pp 591–606
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: International conference on machine learning
Cao Y, Long M, Wang J, Yu PS (2016) Correlation hashing network for efficient cross-modal retrieval. arXiv preprint arXiv:1602.06697
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp 39–43
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9
Escalante HJ, Hernández CA, Gonzalez JA, López-López A, Montes M, Morales EF, Enrique Sucar L, Villasenor L, Grubinger M (2010) The segmented and annotated IAPR TC-12 benchmark. Comput Vis Image Underst 114(4):419–428
Article Google Scholar
Mandal D, Chaudhury KN, Biswas S (2017) Generalized semantic preserving hashing for n-label cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4076–4084
Zou X, Wang X, Bakker EM, Wu S (2021) Multi-label semantics preserving based deep cross-modal hashing. Signal Process Image Commun 93:116131

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities, China (SWU-KT22032).

Author information

Authors and Affiliations

College of Computer and Information Science, Southwest University, Chongqing, 400715, China
Xiang Yuan, Shihao Shan, Yuwen Huo & Song Wu
Faculty of Science, Engineering and Built Environment, Deakin University, Melbourne, 3125, Australia
Junkai Jiang

Authors

Xiang Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Shihao Shan
View author publications
You can also search for this author in PubMed Google Scholar
Yuwen Huo
View author publications
You can also search for this author in PubMed Google Scholar
Junkai Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Song Wu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xiang Yuan made contributions to the conception of research, software, investigation, implementation of experiments, and writing the manuscript. Shihao Shan contributed to the methodology and software. Yuwen Huo contributed to the revision and editing of the returned manuscript. Junkai Jiang contributed to the methodology and software. Song Wu contributed to the research conception, methodology, software, wrote, reviewed, and edited the manuscript.

Corresponding author

Correspondence to Song Wu.

Ethics declarations

Conflict of interest

There are no declared competing interests of the authors that are pertinent to the subject matter of this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yuan, X., Shan, S., Huo, Y. et al. Text-assisted attention-based cross-modal hashing. Int J Multimed Info Retr 13, 3 (2024). https://doi.org/10.1007/s13735-023-00311-7

Download citation

Received: 19 October 2023
Revised: 15 November 2023
Accepted: 21 November 2023
Published: 08 January 2024
DOI: https://doi.org/10.1007/s13735-023-00311-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text-assisted attention-based cross-modal hashing

Abstract

Access this article

Similar content being viewed by others

Multi-attention based semantic deep hashing for cross-modal retrieval

A novel deep translated attention hashing for cross-modal retrieval

Global and Local Feature Based Deep Cross-Modal Hashing

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text-assisted attention-based cross-modal hashing

Abstract

Access this article

Similar content being viewed by others

Multi-attention based semantic deep hashing for cross-modal retrieval

A novel deep translated attention hashing for cross-modal retrieval

Global and Local Feature Based Deep Cross-Modal Hashing

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation