Skip to main content
Log in

Cross-media retrieval based on linear discriminant analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Existing cross-media retrieval approaches usually project low-level features from different modalities of data into a common subspace, in which the similarity of multi-modal data can be measured directly. However, most of the previous subspace learning methods ignore the discriminative property of multi-modal data which may lead to suboptimal cross-media retrieval performance. To address this problem, we propose a novel approach to cross-media retrieval framework based on Linear Discriminant Analysis (LDA), which integrates the correlation between textual features and visual features to learn a pair of projection matrices so that we can project the low-level heterogeneous features into a shared feature space by the transformation matrices. Thus the discriminative characteristic of textual modality is transferred to the corresponding visual features via the correlation analysis process. Experiments on three benchmark datasets show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Abdi H (2007) “Partial least square regression (pls regression)”. Encyclop Res Methods Soc Sci 792–795

  2. Andrew G, Arora R, Bilmes JA, Livescu K (2013) Deep canonical correlation analysis. In: International Conference on Machine Learning (ICML). 1247–1255

  3. Belhumeur PN, Hespanha JP, Kriegman DJ (2002) Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  4. Blaschko M B, Lampert CH (2008) “Correlational spectral clustering,” in Proc. IEEE Int Conf Comput Vis Patt Recog 1–8

  5. Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233

    Article  Google Scholar 

  6. Hardoon DR, Szedmak S, Shawetaylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  MATH  Google Scholar 

  7. Hauptmann A, Yan R, Lin WH, Christel M, Wactlar H (2007) Can high-level concepts fill the semantic gap in video retrieval a case study with broadcast news. IEEE Trans Multimed 9(5):958–966

    Article  Google Scholar 

  8. Li D, Dimitrova N, Li M, Sethi I (2003) “Multimedia content processing through cross-modal association,” in Proceedings of the Eleventh ACM International Conference on Multimedia, 604–611

  9. Li J, Zhao J, Lu K (2016) Joint Feature Selection and Structure Preservation for Domain Adaptation[C]IJCAI. 1697–1703

  10. Li J, Wu Y, Zhao J et al (2016) Multi-manifold sparse graph embedding for multi-modal image classification[J]. Neurocomputing 173(P3):501–510

    Article  Google Scholar 

  11. Li J, Wu Y, Zhao J et al (2017) Low-rank discriminant embedding for Multiview learning[J]. IEEE Trans Cybernet 47(11):3516–3529

    Article  Google Scholar 

  12. Li Z, Nie F, Chang X, et al. (2017) Beyond Trace Ratio: Weighted Harmonic Mean of Trace Ratios for Multiclass Discriminant Analysis[J]. IEEE Trans Knowl Data Eng PP(99):1

  13. Li J, Lu K, Huang Z, et al. (2018) Transfer Independently Together: A Generalized Framework for Domain Adaptation[J]. IEEE Trans Cybernet PP(99):1–12

  14. Lu X, Zhang H, Sun J, Wang Z, Guo P, Wan W (2018) Discriminativecorrelation hashing for supervised cross-modal retrieval, Signal Processing: Image Communication. https://doi.org/10.1016/j.image.2018.04.009

  15. Luo M, Chang X, Li Z et al (2017) Simple to complex cross-modal learning to rank[J]. Comput Vis Image Underst 163

  16. Ma Z, Chang X, Xu Z, et al. (2018) Joint Attributes and Event Analysis for Multimedia Event Detection.[J]. IEEE Trans Neu Netw Learn Syst PP(99):1–10

  17. Nie L, Wang M, Zha ZJ et al (2012) Oracle in image search: a content-based approach to performance prediction[J]. ACM Trans Inf Syst 30(2):13

    Article  Google Scholar 

  18. Nie L, Wang M, Gao Y et al (2013) Beyond text QA: multimedia answer generation by harvesting web information[J]. IEEE Trans Multimed 15(2):426–441

    Article  Google Scholar 

  19. Nie L, Song X, Chua TS (2016) Learning from multiple social networks[J]. Synth Lect Inform Conc Retrie Serv 8(2):118

    Google Scholar 

  20. Nie X, Yin Y, Sun J, Liu J, Cui C (2017) Comprehensive feature-based robust video fingerprinting using tensor model[J]. IEEE Trans Multimed 19(4):785–796

    Article  Google Scholar 

  21. Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  22. Rasiwasia N, Pereira J C, Coviello E, Doyle G, Lanckriet G R G, Levy R (2010) A new approach to cross-modal multimedia retrieval. Int Conf Multimed ACM 251–260

  23. Rosipal R, Kramer N (2006) Overview and recent advances in partial least squares. Subspace Latent Structure Feature Select Techn 3940:34–51

    Article  Google Scholar 

  24. Sharma A. (2012) Generalized Multiview Analysis: A discriminative latent space. IEEE Conf Comput Vis Patt Recog IEEE Comput Soc 2160–2167

  25. Song W, Cui Y, Peng Z (2015) A full-text retrieval algorithm for encrypted data in cloud storage applications. In: National CCf conference on natural language processing and Chinese computing. 229–241

  26. Typke R, Wiering F, Veltkamp R C (2005) "‘A Survey of Music Information Retrieval Systems’." Ismir 2005, International Conference on Music Information Retrieval, London, Uk, 11–15, Proceedings DBLP, 153–160

  27. Wang K, He R, Wang L (2016) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010–2023

    Article  Google Scholar 

  28. Wang W, Yang X, Ooi BC (2016) Effective deep learning-based multi-modal retrieval. Vldb J Int J Very Large Data Bases 25(1):79–101

    Article  Google Scholar 

  29. Wei Y, Zhao Y, Zhu Z, Xiao Y, Wei S (2014) Learning a mid-level feature space for cross-media regularization. IEEE Int Conf Multimed Expo IEEE 1–6

  30. Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2015) Modality-dependent cross-media Retrieval.ACM trans Intell Syst. Technol 7(4):57

    Google Scholar 

  31. Wu W, Xu J, Li H. (2010). Learning similarity function between objects in heterogeneous spaces. Microsoft Res Tech Rep

  32. Wu J, Lin Z, Zha H (2017) Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval. Int ACM SIGIR Conf ACM 917–920

  33. Wu Y, Wang S, Zhang W, Huang Q (2017) Online low-rank similarity function learning with adaptive relative margin for cross-modal retrieval. IEEE Int Conf Multimed Expo 823–828

  34. Xie L, Zhu L, Chen G (2016) Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval. Multimed Tools Appl 75(15):9185–9204

    Article  Google Scholar 

  35. Xie L, Zhu L, Pan P, Lu Y (2016) Cross-modal self-taught hashing for large-scale image retrieval. Signal Process 124(C):81–92

    Article  Google Scholar 

  36. Yan J, Zhang H, Sun J, Wang Q, Guo P, Meng L (2017) Joint graph regularization based modality-dependent cross-media retrieval. Multimed Tools Appl 6:1–19

    Google Scholar 

  37. Zhang L, Ma B, Li G, Huang Q, Tian Q (2018) Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans Multimed 20(1):128–141

    Article  Google Scholar 

  38. Zhu, L, Shen J, Xie L (2015) Topic Hypergraph Hashing for Mobile Image Retrieval. ACM Int Conf Multimed 843–846

  39. Zhu L, She J, Liu X, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search. International Joint Conference on Artificial Intelligence. AAAI Press. 3959–3965

  40. Zhu L, Shen J, Xie L, Cheng Z. (2016). Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybernet PP(99), 1–14

  41. Zhu L, She J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search. Int Joint Conf Artif Intel AAAI Press 3959–3965

  42. Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised Visual Hashing with Semantic Assistant for Content-Based Image Retrieval. IEEE Trans Knowl Data En PP(99):1

  43. Zhu L, Huang Z, Li Z, Xie L (2018) Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval. IEEE Trans Neural Netw Learn Syst PP(99):1–13

Download references

Acknowledgements

The work is partially supported by the National Natural Science Foundation of China (Nos. 61572298, 61772322) and the Key Research and Development Foundation of Shandong Province China (Nos. 2017CXGC0703, 2017GGX10117, 2016GGX101009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaxiang Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(RAR 10592 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qi, Y., Zhang, H., Zhang, B. et al. Cross-media retrieval based on linear discriminant analysis. Multimed Tools Appl 78, 24249–24268 (2019). https://doi.org/10.1007/s11042-018-6994-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6994-1

Keywords

Navigation