Skip to main content
Log in

Semantic ranking structure preserving for cross-modal retrieval

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Cross-modal retrieval not only needs to eliminate the heterogeneity of modalities, but also needs to constrain the return order of retrieval results. Accordingly, we propose a novel common representation space learning method, called Semantic Ranking Structure Preserving (SRSP) for Cross-modal Retrieval in this paper. First, the dependency relationship between labels is used to minimize the discriminative loss of multi-modal data and mine potential relationships between samples to get richer semantic information in the common space. Second, we constrain the correlation ranking of representations in common space, so as to break the modal gap and promote the multi-modal correlation learning. The comprehensive experimental comparison results show that our algorithm substantially enhances the performance and consistently outperforms very recent algorithms in terms of widely used cross-modal benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Peng Y, Qi J (2019) Cm-gans: Cross-modal generative adversarial networks for common representation learning. ACM Trans Multimed Comput Commun Appl (TOMM) 15(1):1–24

    Article  MathSciNet  Google Scholar 

  2. Zhang J, Peng Y (2020) Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval. IEEE Trans Multimed 22(1):174–187

    Article  Google Scholar 

  3. Zhu B, Ngo C-W, Chen J, Hao Y (2019) R2gan: Cross-modal recipe retrieval with generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11477–11486

  4. Zhang Y-y, Feng Y, Liu D-j, Shang J-x, Qiang B-h (2020) Frwcae: joint faster-rcnn and wasserstein convolutional auto-encoder for instance retrieval. Applied Intelligence, pp 1–14

  5. Varish N, Pal AK (2018) A novel image retrieval scheme using gray level co-occurrence matrix descriptors of discrete cosine transform based residual image. Appl Intell 48 (9):2930– 2953

    Article  Google Scholar 

  6. Chen J, Cheung WK (2019) Similarity preserving deep asymmetric quantization for image retrieval. Proc AAAI Conf Artif Intell 33:8183–8190

    Google Scholar 

  7. Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: International conference on machine learning, pp 1247–1255

  8. Wang K, He R, Wang L, Wang W, Tan T (2015) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010–2023

    Article  Google Scholar 

  9. Yi Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2011) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742

    Google Scholar 

  10. Ma D, Zhai X, Peng Y (2013) Cross-media retrieval by cluster-based correlation analysis. In: 2013 IEEE International Conference On Image Processing. IEEE, pp 3986–3990

  11. Ye Z, Peng Y (2019) Sequential cross-modal hashing learning via multi-scale correlation mining. ACM Trans Multimed Comput Commun Appl (TOMM) 15(4):1–20

    MathSciNet  Google Scholar 

  12. Jiang Q-Y, Li W-J (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240

  13. Hotelling H (1992) Relations between two sets of variates. In: Breakthroughs in statistics. Springer, pp 162–190

  14. Ranjan V, Rasiwasia N, Jawahar C V (2015) Multi-label cross-modal retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4094–4102

  15. Cao Y, Long M, Wang J, Liu S (2017) Collective deep quantization for efficient cross-modal retrieval. In: Thirty-First AAAI Conference on Artificial Intelligence

  16. Wu L, Wang Y, Shao L (2019) Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans Image Process 28(4):1602–1612

    Article  MathSciNet  Google Scholar 

  17. Wu G, Lin Z, Han J, Li L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp 2854– 2860

  18. Peng Y, Huang X, Qi J (2016) Cross-media shared representation by hierarchical learning with multiple deep networks. In: IJCAI, pp 3846–3853

  19. Wang W, Yang X, Ooi BC, Zhang D, Zhuang Y (2016) Effective deep learning-based multi-modal retrieval. VLDB J 25(1):79–101

    Article  Google Scholar 

  20. Cao Y, Long M, Wang J, Zhu H (2016) Correlation autoencoder hashing for supervised cross-modal search. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp 197–204

  21. Wang X, Huang Q, Celikyilmaz A, Gao J, Shen D, Wang Y-F, Wang WY, Zhang L (2019) Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6629–6638

  22. Liong VE, Lu J, Tan Y-P, Zhou J (2017) Cross-modal deep variational hashing. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4077–4085

  23. Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251

  24. Xu R, Li C, Yan J, Deng C, Liu X (2019) Graph convolutional network hashing for cross-modal retrieval. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. IJCAI, pp 10–16

  25. Mandal D, Rao P, Biswas S (2019) Semi-supervised cross-modal retrieval with label prediction. IEEE Transactions on Multimedia

  26. Zhen L, Hu P, Xu W, Peng D (2019) Deep supervised cross-modal retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10394–10403

  27. Wang B, Yang Y, Xu X, Hanjalic A, Shen Heng T (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on Multimedia, pp 154–162

  28. Peng Y, Qi J, Huang X, Yuan Y (2018) Ccl: Cross-modal correlation learning with multigrained fusion by hierarchical network. IEEE Trans Multimed 20(2):405–420

    Article  Google Scholar 

  29. Wang L, Li Y, Lazebnik S (2016) Learning deep structure-preserving image-text embeddings. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5005–5013

  30. Peng Y, Qi J, Zhuo Y (2020) Mava: Multi-level adaptive visual-textual alignment by cross-media bi-attention mechanism. IEEE Trans Image Process 29:2728–2741

    Article  Google Scholar 

  31. He X, Peng Y, Xie L (2019) A new benchmark and approach for fine-grained cross-media retrieval. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 1740–1748

  32. Gu J, Cai J, Joty SR, Niu L, Wang G (2018) Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7181–7189

  33. Chen Z-M, Wei X-S, Wang P, Guo Y (2019) Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5177–5186

  34. Kipf TN, Welling M Semi-supervised classification with graph convolutional networks

  35. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

  36. Cer D, Yang Y, Kong S-y, Hua N, Limtiaco N, St John R, Constant N, Guajardo-Cespedes M, Yuan S, Tar C et al (2018) Universal sentence encoder for english. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 169–174

  37. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. Computer Science

  38. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In European conference on computer vision. Springer, pp 740–755

  39. Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9

  40. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst (TOIS) 20(4):422–446

    Article  Google Scholar 

  41. Wang X, Hua Y, Kodirov E, Hu G, Garnier R, Robertson NM (2019) Ranked list loss for deep metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5207–5216

Download references

Acknowledgements

Supported by National Key R&D Program of China (No. 2017YFB1402400), National Nature Science Foundation of China (No. 61762025), Guangxi Key Laboratory of Trusted Software (No. kx202006), Guangxi Key Laboratory of Optoelectroric Information Processing (No. GD18202), and Natural Science Foundation of Guangxi Province, China (No. 2019GXNSFDA185007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Feng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Feng, Y., Zhou, M. et al. Semantic ranking structure preserving for cross-modal retrieval. Appl Intell 51, 1802–1812 (2021). https://doi.org/10.1007/s10489-020-01930-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01930-x

Keywords

Navigation