A novel deep translated attention hashing for cross-modal retrieval

Yu, Haibo; Ma, Ran; Su, Min; An, Ping; Li, Kai

doi:10.1007/s11042-022-12860-w

A novel deep translated attention hashing for cross-modal retrieval

Published: 28 March 2022

Volume 81, pages 26443–26461, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Haibo Yu^1,2,
Ran Ma ORCID: orcid.org/0000-0002-6181-1240^1,2,
Min Su^1,2,
Ping An^1,2 &
…
Kai Li^1,2

367 Accesses
2 Citations
Explore all metrics

Abstract

In recent years, driven by the increasing number of cross-modal data such as images and texts, cross-modal retrieval has received intensive attention. Great progress has made in deep cross-modal hash retrieval, which integrates feature leaning and hash learning into an end-to-end trainable framework to obtain the better hash codes. However, due to the heterogeneity between images and texts, it is still a challenge to compare the similarity between them. Most previous approaches embed images and texts into a joint embedding subspace independently and then compare their similarity, which ignore the influence of irrelevant regions (regions in images without the corresponding textual description) on cross-modal retrieval and the fine-grained interactions between images and texts. To address these issues, a new cross-modal hashing called Deep Translated Attention Hashing for Cross-Modal Retrieval (DTAH) is proposed. Firstly, DTAH extracts image and text features through the bottom-up attention and the recurrent neural network respectively to reduce the influence of irrelevant regions on cross-modal retrieval. Then, with the help of cross-modal attention module, DTAH captures the fine-grained interactions between vision and language at region level and word level, and then embeds the text features into the image feature space. In this way, the proposed DTAH effectively shrinks the heterogeneity between images and texts, and can learn the discriminative hash codes. Extensive experiments on three benchmark data sets demonstrate that DTAH surpasses the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global and Local Feature Based Deep Cross-Modal Hashing

Multi-attention based semantic deep hashing for cross-modal retrieval

Article 20 January 2021

Deep semantic hashing with dual attention for cross-modal retrieval

Article 12 November 2021

References

Alphonse AS, Mary NAB, Starvin MS (2020) Classification of membrane protein using tetra peptide pattern. Anal Biochem 606:113845
Article Google Scholar
Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 6077–6086
Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: proceedings of the 30th international conference on international conference on machine learning, pp 1247–1255.
Cadene R, Ben-younes H, Cord M, Thome N (2019) Murel: multimodal relational reasoning for visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1989-1998.
Cao Y, Long M, Wang J, Yang Q, Yu P S (2016) Deep visual-semantic hashing for cross-modal retrieval. In: proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1445-1454.
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: Delving deep into convolutional nets, arXiv preprint arXiv:1405.3531.
Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, 48
Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint arXiv:1412.3555
Ding G, Guo Y, Zhou J (2014) collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075-2082.
Escalante HJ, Hernández CA, Gonzalez JA, López-López A, Montes M, Morales EF, Sucar LE, Villaseñor L, Grubinger M (2010) The segmented and annotated iapr tc-12 benchmark. Comput Vis Image Underst 114(4):419–428
Article Google Scholar
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 4438–4446.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: proceedings of the 27th international conference on neural information processing systems, pp 2672-2680.
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neuralcomputation 9(8):1735–1780
Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141.
Huang P-Y, Vaibhav, Chang X, Hauptmann AG (2019) Improving what cross-modal retrieval models learn through object-oriented inter- and intra-modal attention networks. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, 244–252
Huiskes M J, Lew M S (2008) The mir flickr retrieval evaluation. In: proceedings of the 1st ACM international conference on multimedia information retrieval, pp 39-43
Irie G, Arai H, Taniguchi Y (2015) Alternating co-quantization for cross-modal hashing. In: proceedings of the 2015 IEEE international conference on computer vision (ICCV), pp 1886–1894.
Jayapriya K, Mary NAB (2019) Employing a novel 2-gram subgroup intra pattern (2gsip) with stacked auto encoder for membrane protein classification. Mol Biol Rep 46(2):2259–2272
Article Google Scholar
Jayapriya K, Jacob IJ, Mary NAB (2020) Person re-identification using prioritized chromatic texture (pct) with deep learning. Multimed Tools Appl 79(39):29399–29410
Article Google Scholar
Jiang Q, Li W (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3232–3240.
Jin L, Shu X, Li K, Li Z, Qi G-J, Tang J (2019) Deep ordinal hashing with spatial attention. IEEE Trans Image Process 28(5):2173–2186
Article MathSciNet Google Scholar
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA, Bernstein MS, Fei-Fei L (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73
Article MathSciNet Google Scholar
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: proceedings of the 22nd international joint conference on artificial intelligence, pp 1360-1365.
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4242–4251.
Li Z, Tang J, Mei T (2019) Deep collaborative embedding for social image understanding. IEEE Trans Pattern Anal Mach Intell 41(9):2070–2083
Article Google Scholar
Li Z, Tang J, Zhang L, Yang J (2020) Weakly-supervised semantic guided hashing for social image retrieval. Int J Comput Vis 128(8):2265–2278
Article MathSciNet Google Scholar
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: 2015 IEEE conference on computer vision and pattern recognition, pp 3864-3872
Liu W, Mu C, Kumar S, Chang S-F (2014) Discrete graph hashing. In: proceedings of the 27th international conference on neural information processing systems, 3419-3427
Liu H, Ji R, Wu Y, Hua G (2016) Supervised matrix factorization for cross-modality hashing. In: proceedings of the 25th international joint conference on artificial intelligence, pp 1767-1773.
Luong M-T, Pham H, Manning C D (2015) Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025
Peng H, He J, Chen S, Wang Y, Qiao Y (2019) Dual-supervised attention network for deep cross-modal hashing. Pattern Recogn Lett 128:333–339
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556.
Song J, Yang Y, Yang Y, Huang Z, Shen H-T (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: proceedings of the 2013 ACM SIGMOD international conference on Management of Data, pp 785-796
Wang D, Gao X, Wang X, He L (2015) Semantic topic multimodal hashing for cross-media retrieval. In: proceedings of the 24th international conference on artificial intelligence, pp 3890-3896
Wang B, Yang Y, Xu X, Hanjalic A, Shen H T (2017) Adversarial cross-modal retrieval. In: proceedings of the 2017 ACM on multimedia conference, pp 154-162.
Wu L, Wang Y, Shao L (2019) Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans Image Process 28(4):1602–1612
Article MathSciNet Google Scholar
Wu J, Weng W, Fu J, Liu L, Hu B (2021) Deep semantic hashing with dual attention for cross-modal retrieval. Neural Comput & Applic 34:5397–5416. https://doi.org/10.1007/s00521-021-06696-y
Article Google Scholar
Xiong H, He Z, Hu X, Wu H (2018) Multi-channel encoder for neural machine translation. In: 32nd AAAI conference on artificial intelligence, pp 4962-4969
Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: proceedings of the 31st AAAI conference on artificial intelligence, pp 1618-1625
Yang X, Liu W, Liu W, Tao D (2021) A survey on canonical correlation analysis. IEEE Trans Knowl Data Eng 33(6):2349–2368
Article Google Scholar
Ye L, Rochan M, Liu Z, Wang Y (2019) Cross-modal self-attention network for referring image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 10502–10511
Zhang D, Li W-J (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: proceedings of the 28th AAAI conference on artificial intelligence, pp 2177-2183.
Zhang X, Lai H, Feng J (2018) Attention-aware deep adversarial hashing for cross-modal retrieval. In: European Conference on Computer Vision, 591–606, Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval.

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No. 62020106011, 61828105, Chen Guang Project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation under Grant No.17CG41.

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, 99 Shangda Road, Baoshan District, Shanghai, 200444, China
Haibo Yu, Ran Ma, Min Su, Ping An & Kai Li
Shanghai University, Shanghai Institute for Advanced Communication and Data Science, 99 Shangda Road, Baoshan District, Shanghai, 200444, China
Haibo Yu, Ran Ma, Min Su, Ping An & Kai Li

Authors

Haibo Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ran Ma
View author publications
You can also search for this author in PubMed Google Scholar
Min Su
View author publications
You can also search for this author in PubMed Google Scholar
Ping An
View author publications
You can also search for this author in PubMed Google Scholar
Kai Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ran Ma.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, H., Ma, R., Su, M. et al. A novel deep translated attention hashing for cross-modal retrieval. Multimed Tools Appl 81, 26443–26461 (2022). https://doi.org/10.1007/s11042-022-12860-w

Download citation

Received: 11 November 2020
Revised: 28 December 2021
Accepted: 09 March 2022
Published: 28 March 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11042-022-12860-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel deep translated attention hashing for cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Global and Local Feature Based Deep Cross-Modal Hashing

Multi-attention based semantic deep hashing for cross-modal retrieval

Deep semantic hashing with dual attention for cross-modal retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel deep translated attention hashing for cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Global and Local Feature Based Deep Cross-Modal Hashing

Multi-attention based semantic deep hashing for cross-modal retrieval

Deep semantic hashing with dual attention for cross-modal retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation