Skip to main content
Log in

Cross-media retrieval based on semi-supervised regularization and correlation learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

As large scale multimedia data in heterogeneous spaces is flooding into the Internet, cross-media retrieval is becoming increasingly significant. In cross-media retrieval, users can retrieve the results containing various types of media by submitting a query of any media type. However, most existing cross-media retrieval methods are restricted to the retrieval between two types of media, which ignores the semantic consistency of different media data. In addition, although some methods consider the similarity between same semantic category data in different media, they neglect the dissimilarity between different semantic category data in different media. To solve the above problems, we propose a novel feature learning algorithm for cross-media retrieval, called semi-supervised regularization and correlation learning (SSRCL), which is capable of modeling multiple types of media simultaneously. More importantly, SSRCL considers both semantic category similarity and dissimilarity simultaneously, and utilizes both labeled and unlabeled data to learn the projection matrices for different media types. The experimental results show that our proposed approach, compared with four state-of-the-art methods, has better performance on two extensively used datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Battiato S, Farinella GM, Giuffrida G, Tribulato G (2007) Data mining learning bootstrap through semantic thumbnail analysis. In: Proceedings of Spie 6506, pp 1–8

  2. Battiato S, Farinella GM, Giuffrida G, Sismeiro C, Tribulato G (2009) Using visual and text features for direct marketing on multimedia messaging services domain. Multimedia Tools and Applications 42(1):5–30

    Article  Google Scholar 

  3. Blaschko M, Lampert C (2008) Correlational spectral clustering. In: IEEE conference on computer vision and pattern recognition, pp 1–8

  4. Belkin M, Niyogi P, Sindhwani V (2004) Manifold regularization: a geometric framework for learning from examples. J Mach Learn Res 7(1):2399–2434

    MathSciNet  MATH  Google Scholar 

  5. Chen D, Tian X, Shen Y, Ouhyoung M (2010) On visual similarity based 3D model retrieval. Comput Graphics Forum 22(3):223–232

    Article  Google Scholar 

  6. Clinchant S, Ah-Pine J, Csurka G (2011) Semantic combination of textual and visual information in multimedia retrieval. In: ACM international conference on multimedia retrieval, pp 44P.1–44P.8

  7. Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of rich media objects supporting multiple multimodal queries. IEEE Trans Multimedia 14 (3):734–746

    Article  Google Scholar 

  8. Escalante HJ, Hérnadez CA, Sucar LE, Montes M (2008) Late fusion of heterogeneous methods for multimedia image retrieval. In: ACM international conference on multimedia information retrieval, pp 172–179

  9. Gao Z, Zhang H, Xu GP, Xue YB, Hauptmann AG (2014) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112(C):83–97

    Google Scholar 

  10. Gao Z, Li SH, Zhu YJ, Wang C, Zhang H (2017) Collaborative sparse representation leaning model for RGBD action recognition. J Vis Commun Image Represent 48(C):442–452

    Article  Google Scholar 

  11. Gong D, Li Z, Liu J, Qiao Y (2013) Multi-feature canonical correlation analysis for face photo-sketch image retrieval. In: ACM international conference on multimedia, pp 617–620

  12. Greenspan H, Goldberger J, Mayer A (2004) Probabilistic space-time video modeling via piecewise Gmm. IEEE Trans Pattern Anal Mach Intell 26(3):384–396

    Article  Google Scholar 

  13. Hardoon DR, Szedmak SR, Shawe-Taylor JR (2014) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16 (12):2639–2664

    Article  MATH  Google Scholar 

  14. Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: International ACM SIGIR conference on research and development in information retrieval, pp 119–126

  15. Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Eleventh ACM international conference on multimedia, pp 604–611

  16. Li B, Du J, Zhang XP (2016) Feature extraction using maximum nonparametric margin projection. Neurocomputing 188:225–232

    Article  Google Scholar 

  17. Li B, Lei L, Zhang XP (2016) Constrained discriminant neighborhood embedding for high dimensional data feature extraction. Neurocomputing 173:137–144

    Article  Google Scholar 

  18. Liu Y, Zhao WL, Ngo CW, Xu CS, Lu HQ (2010) Coherent bag of audio words model for efficient large-scale video copy detection. In: ACM international conference on image and video retrieval, pp 89–96

  19. Moffat A, Zobel J (1996) Self-indexing inverted files for fast text retrieval. ACM Trans Inf Syst (TOIS) 14(4):349–379

    Article  Google Scholar 

  20. Mroueh Y, Marcheret E, Goel V (2016) Multimodal retrieval with asymmetrically weighted regularized canonical correlation analysis. Computer Science

  21. Peng Y, Ngo CW (2006) Clip-based similarity measure for query-dependent clip retrieval and video summarization. IEEE Trans Circuits Syst Video Technol 16 (5):612–627

    Article  Google Scholar 

  22. Peng Y, Zhai X, Zhao Y, Huang X (2016) Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Trans Circuits Syst Video Technol 26(3):583–596

    Article  Google Scholar 

  23. Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet G, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  24. Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: ACM international conference on multimedia, pp 251–260

  25. Sindhwani V, Niyogi P, Belkin M (2005) Beyond the point cloud: from transductive to semi-supervised learning. In: International conference on machine learning, pp 824–831

  26. Typke R, Wiering F, Veltkamp RC (2005) A survey of music information retrieval systems. In: The international society for music information retrieval (ISMIR), pp 153–160

  27. Wang Y, Guan L, Venetsanopoulos AN (2012) Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Trans Multimedia 14(3):597–607

    Article  Google Scholar 

  28. Wang K, He R, Wang L, Wang W, Tan T (2016) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010–2023

    Article  Google Scholar 

  29. Yan Y, Nie F, Li W, Gao C, Yang Y, Xu D (2016) Image classification by cross-media active learning with privileged information. IEEE Trans Multimedia 18 (12):2494–2502

    Article  Google Scholar 

  30. Yang Y, Zhuang YT, Wu F, Pan YH (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimedia 10(3):437–446

    Article  Google Scholar 

  31. Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on Semi-Supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742

    Article  Google Scholar 

  32. Yang Y, Ma Z, Hauptmann AG, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans Multimedia 15 (3):661–669

    Article  Google Scholar 

  33. Yu J, Tian Q (2008) Semantic subspace projection and its applications in image retrieval. IEEE Trans Circuits Syst Video Technol (TCSVT) 18(4):544–548

    Article  Google Scholar 

  34. Zhai X, Peng Y, Xiao J (2012) Effective heterogeneous similarity measure with nearest neighbors for cross-media retrieval. In: International conference on advances in multimedia modeling, pp 312–322

  35. Zhai X, Peng Y, Xiao J (2012) Cross-modality correlation propagation for cross-media retrieval. In: IEEE international conference on acoustics, speech and signal processing, pp 2337–2340

  36. Zhai X, Peng Y, Xiao J (2013) Heterogeneous metric learning with joint graph regularization for cross-media retrieval. In: Twenty-seventh AAAI conference on artificial intelligence, pp 1198–1204

  37. Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semi-supervised regularization. IEEE Trans Circuits Syst Video Technol 24 (6):965–978

    Article  Google Scholar 

  38. Zhang H, Zha ZJ, Yang Y, Yan S, Gao Y, Chua TS (2013) Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM international conference on multimedia, pp 33–42

  39. Zhang H, Zha ZJ, Yang Y, Yan S, Chua TS (2014) Robust (semi) nonnegative graph embedding. IEEE Trans Image Process 23(7):2996–3012

    Article  MathSciNet  MATH  Google Scholar 

  40. Zhang H, Shang X, Luan H, Wang M, Chua TS (2016) Learning from collective intelligence: feature learning using social images and tags. ACM Trans Multimed Comput Commun Appl 13(1)

  41. Zhang L, Ma B, Li G, Huang Q, Tian Q (2017) Cross-Modal retrieval using multi-ordered discriminative structured subspace learning. IEEE Trans Multimedia 19 (6):1220–1233

    Article  Google Scholar 

  42. Zheng L, Yang Y, Tian Q (2017) SIFT meets CNN: a decade survey of instance retrieval. IEEE Trans Pattern Anal Mach Intell PP(99):1–1

    Google Scholar 

  43. Zhou D, Bousquet O, Lal T, Weston J (2003) Learning with local and global consistency. In: International conference on neural information processing systems, pp 321–328

  44. Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised Learning using gaussian fields and harmonic functions. In: Twentieth international conference on international conference on machine learning, pp 912–919

  45. Zhu L, Xu Z, Yang Y, Hauptmann AG (2017) Uncovering the temporal context for video question answering. Int J Comput Vis 124(3):409–421

    Article  MathSciNet  Google Scholar 

  46. Zhuang Y, Wang Y, Wu F, Zhang Y, Lu W (2013) Supervised coupled dictionary learning with group structures for multi-modal retrieval. In: American association for artificial intelligence (AAAI)

  47. Znaidia A, Shabou A, Le Borgne H, Hudelot C, Paragios N (2012) Bag-of-multimedia-words for image classification. In: International conference on pattern recognition, pp 1509–1512

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No. 61373109, No. 61602349), the Educational Research Project from the Educational Commission of Hubei Province (2016234).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Dai, G., Tang, D. et al. Cross-media retrieval based on semi-supervised regularization and correlation learning. Multimed Tools Appl 77, 22455–22473 (2018). https://doi.org/10.1007/s11042-018-6037-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6037-y

Keywords

Navigation