Abstract
Existing cross-media retrieval approaches usually project low-level features from different modalities of data into a common subspace, in which the similarity of multi-modal data can be measured directly. However, most of the previous subspace learning methods ignore the discriminative property of multi-modal data which may lead to suboptimal cross-media retrieval performance. To address this problem, we propose a novel approach to cross-media retrieval framework based on Linear Discriminant Analysis (LDA), which integrates the correlation between textual features and visual features to learn a pair of projection matrices so that we can project the low-level heterogeneous features into a shared feature space by the transformation matrices. Thus the discriminative characteristic of textual modality is transferred to the corresponding visual features via the correlation analysis process. Experiments on three benchmark datasets show the effectiveness of our approach.
Similar content being viewed by others
References
Abdi H (2007) “Partial least square regression (pls regression)”. Encyclop Res Methods Soc Sci 792–795
Andrew G, Arora R, Bilmes JA, Livescu K (2013) Deep canonical correlation analysis. In: International Conference on Machine Learning (ICML). 1247–1255
Belhumeur PN, Hespanha JP, Kriegman DJ (2002) Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Blaschko M B, Lampert CH (2008) “Correlational spectral clustering,” in Proc. IEEE Int Conf Comput Vis Patt Recog 1–8
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233
Hardoon DR, Szedmak S, Shawetaylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Hauptmann A, Yan R, Lin WH, Christel M, Wactlar H (2007) Can high-level concepts fill the semantic gap in video retrieval a case study with broadcast news. IEEE Trans Multimed 9(5):958–966
Li D, Dimitrova N, Li M, Sethi I (2003) “Multimedia content processing through cross-modal association,” in Proceedings of the Eleventh ACM International Conference on Multimedia, 604–611
Li J, Zhao J, Lu K (2016) Joint Feature Selection and Structure Preservation for Domain Adaptation[C]IJCAI. 1697–1703
Li J, Wu Y, Zhao J et al (2016) Multi-manifold sparse graph embedding for multi-modal image classification[J]. Neurocomputing 173(P3):501–510
Li J, Wu Y, Zhao J et al (2017) Low-rank discriminant embedding for Multiview learning[J]. IEEE Trans Cybernet 47(11):3516–3529
Li Z, Nie F, Chang X, et al. (2017) Beyond Trace Ratio: Weighted Harmonic Mean of Trace Ratios for Multiclass Discriminant Analysis[J]. IEEE Trans Knowl Data Eng PP(99):1
Li J, Lu K, Huang Z, et al. (2018) Transfer Independently Together: A Generalized Framework for Domain Adaptation[J]. IEEE Trans Cybernet PP(99):1–12
Lu X, Zhang H, Sun J, Wang Z, Guo P, Wan W (2018) Discriminativecorrelation hashing for supervised cross-modal retrieval, Signal Processing: Image Communication. https://doi.org/10.1016/j.image.2018.04.009
Luo M, Chang X, Li Z et al (2017) Simple to complex cross-modal learning to rank[J]. Comput Vis Image Underst 163
Ma Z, Chang X, Xu Z, et al. (2018) Joint Attributes and Event Analysis for Multimedia Event Detection.[J]. IEEE Trans Neu Netw Learn Syst PP(99):1–10
Nie L, Wang M, Zha ZJ et al (2012) Oracle in image search: a content-based approach to performance prediction[J]. ACM Trans Inf Syst 30(2):13
Nie L, Wang M, Gao Y et al (2013) Beyond text QA: multimedia answer generation by harvesting web information[J]. IEEE Trans Multimed 15(2):426–441
Nie L, Song X, Chua TS (2016) Learning from multiple social networks[J]. Synth Lect Inform Conc Retrie Serv 8(2):118
Nie X, Yin Y, Sun J, Liu J, Cui C (2017) Comprehensive feature-based robust video fingerprinting using tensor model[J]. IEEE Trans Multimed 19(4):785–796
Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Rasiwasia N, Pereira J C, Coviello E, Doyle G, Lanckriet G R G, Levy R (2010) A new approach to cross-modal multimedia retrieval. Int Conf Multimed ACM 251–260
Rosipal R, Kramer N (2006) Overview and recent advances in partial least squares. Subspace Latent Structure Feature Select Techn 3940:34–51
Sharma A. (2012) Generalized Multiview Analysis: A discriminative latent space. IEEE Conf Comput Vis Patt Recog IEEE Comput Soc 2160–2167
Song W, Cui Y, Peng Z (2015) A full-text retrieval algorithm for encrypted data in cloud storage applications. In: National CCf conference on natural language processing and Chinese computing. 229–241
Typke R, Wiering F, Veltkamp R C (2005) "‘A Survey of Music Information Retrieval Systems’." Ismir 2005, International Conference on Music Information Retrieval, London, Uk, 11–15, Proceedings DBLP, 153–160
Wang K, He R, Wang L (2016) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010–2023
Wang W, Yang X, Ooi BC (2016) Effective deep learning-based multi-modal retrieval. Vldb J Int J Very Large Data Bases 25(1):79–101
Wei Y, Zhao Y, Zhu Z, Xiao Y, Wei S (2014) Learning a mid-level feature space for cross-media regularization. IEEE Int Conf Multimed Expo IEEE 1–6
Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2015) Modality-dependent cross-media Retrieval.ACM trans Intell Syst. Technol 7(4):57
Wu W, Xu J, Li H. (2010). Learning similarity function between objects in heterogeneous spaces. Microsoft Res Tech Rep
Wu J, Lin Z, Zha H (2017) Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval. Int ACM SIGIR Conf ACM 917–920
Wu Y, Wang S, Zhang W, Huang Q (2017) Online low-rank similarity function learning with adaptive relative margin for cross-modal retrieval. IEEE Int Conf Multimed Expo 823–828
Xie L, Zhu L, Chen G (2016) Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval. Multimed Tools Appl 75(15):9185–9204
Xie L, Zhu L, Pan P, Lu Y (2016) Cross-modal self-taught hashing for large-scale image retrieval. Signal Process 124(C):81–92
Yan J, Zhang H, Sun J, Wang Q, Guo P, Meng L (2017) Joint graph regularization based modality-dependent cross-media retrieval. Multimed Tools Appl 6:1–19
Zhang L, Ma B, Li G, Huang Q, Tian Q (2018) Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans Multimed 20(1):128–141
Zhu, L, Shen J, Xie L (2015) Topic Hypergraph Hashing for Mobile Image Retrieval. ACM Int Conf Multimed 843–846
Zhu L, She J, Liu X, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search. International Joint Conference on Artificial Intelligence. AAAI Press. 3959–3965
Zhu L, Shen J, Xie L, Cheng Z. (2016). Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybernet PP(99), 1–14
Zhu L, She J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search. Int Joint Conf Artif Intel AAAI Press 3959–3965
Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised Visual Hashing with Semantic Assistant for Content-Based Image Retrieval. IEEE Trans Knowl Data En PP(99):1
Zhu L, Huang Z, Li Z, Xie L (2018) Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval. IEEE Trans Neural Netw Learn Syst PP(99):1–13
Acknowledgements
The work is partially supported by the National Natural Science Foundation of China (Nos. 61572298, 61772322) and the Key Research and Development Foundation of Shandong Province China (Nos. 2017CXGC0703, 2017GGX10117, 2016GGX101009).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
ESM 1
(RAR 10592 kb)
Rights and permissions
About this article
Cite this article
Qi, Y., Zhang, H., Zhang, B. et al. Cross-media retrieval based on linear discriminant analysis. Multimed Tools Appl 78, 24249–24268 (2019). https://doi.org/10.1007/s11042-018-6994-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6994-1