Skip to main content
Log in

Cross-media retrieval by intra-media and inter-media correlation mining

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

With the rapid development of multimedia content on the Internet, cross-media retrieval has become a key problem in both research and application. Cross-media retrieval is able to retrieve the results of the same semantics with the query, but with different media types. For instance, given a query image of Moraine Lake, besides retrieving the images about Moraine Lake, cross-media retrieval system can also retrieve the related media contents of different media types such as text description. As a result, measuring content similarity between different media is a challenging problem. In this paper, we propose a novel cross-media similarity measure. It considers both intra-media and inter-media correlation, which are ignored by existing works. Intra-media correlation focuses on semantic category information within each media, while inter-media correlation focuses on positive and negative correlations between different media types. Both of them are very important and their adaptive fusion can complement each other. To mine the intra-media correlation, we propose a heterogeneous similarity measure with nearest neighbors (HSNN). The heterogeneous similarity is obtained by computing the probability for two media objects belonging to the same semantic category. To mine the inter-media correlation, we propose a cross-media correlation propagation (CMCP) approach to simultaneously deal with positive and negative correlation between media objects of different media types, while existing works focus solely on the positive correlation. Negative correlation is very important because it provides effective exclusive information. The correlations are modeled as must-link constraints and cannot-link constraints, respectively. Furthermore, our approach is able to propagate the correlation between heterogeneous modalities. Finally, both HSNN and CMCP are flexible, so that any traditional similarity measure could be incorporated. An effective ranking model is learned by further fusion of multiple similarity measures through AdaRank for cross-media retrieval. The experimental results on two datasets show the effectiveness of our proposed approach, compared with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Blaschko, M., Lampert, C.: Correlational spectral clustering. IEEE International Conference on on Computer Vision and Pattern Recognition (CVPR) (2007)

  2. Bredin, H., Chollet, G.: Audio-visual speech synchrony measure for talking-face identity verification. International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2007)

  3. Clinchant, S., Ah-Pine, J., Csurka, G.: Semantic combination of textual and visual information in multimedia retrieval. In: ACM International Conference on Multimedia Retrieval (2011)

  4. Escalante, H., Hérnadez, C., Sucar, L., Montes, M.: Late fusion of heterogeneous methods for multimedia image retrieval. Proceeding of the 1st ACM international conference on Multimedia information retrieval (2008)

  5. Grangier, D., Bengio, S.: A discriminative kernel-based model to rank images from text queries. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1371–1384 (2008)

    Article  Google Scholar 

  6. Greenspan, H., Goldberger, J., Mayer, A.: Probabilistic space-time video modeling via piecewise gmm. IEEE Trans. Pattern Anal. Mach. Intell. 26(3), 384–396 (2004)

    Article  Google Scholar 

  7. Guillaumin, M., Verbeek, J., Schmid, C.: Is that you? metric learning approaches for face identification. International Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

  8. Hotelling, H.: Relations between two sets of variates. Biometrika 28(3-4), 321–377 (1936)

    Article  MATH  Google Scholar 

  9. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference (2003)

  10. Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. IEEE International Conference on Computer Vision (ICCV) (2011)

  11. Kidron, E., Schechner, Y., Elad, M.: Pixels that sound. IEEE International Conference on on Computer Vision and Pattern Recognition (CVPR) (2005)

  12. Lew, M., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. ACM Trans. Multime'd. Comput. Commun. Appl. 2, 1–19 (2006)

    Article  Google Scholar 

  13. Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In:Proceedings of the ACM International Conference on Multimedia pp. 604–611 (2003)

  14. Li, Z., Liu, J., Tang, X.: Pairwise constraint propagation by semidefinite programming for semi-supervised classification. ICML pp. 576–583 (2008)

  15. Liu, J., Xu, C., Lu, H.: Cross-media retrieval: state-of-the-art and open issues. Int. J. Multime'd. Intell. Secur. 1(1), 33–52 (2010)

    Google Scholar 

  16. Lu Z., Carreira-Perpinan M. (2008) Constrained spectral clustering through affinity propagation. CVPR

  17. Lu, Z., Ip, H.: Constrained spectral clustering via exhaustive and efficient constraint propagation. In: Proceedings of the European Conference on Computer Vision (2010)

  18. Peng, Y., Ngo, C.: Clip-based similarity measure for query-dependent clip retrieval and video summarization. IEEE Trans. Circuits Syst. Video Technol. 16(5), 612–627 (2006)

    Article  Google Scholar 

  19. Rasiwasia, N., Moreno, P., Vasconcelos, N.: Bridging the gap: Query by semantic example. IEEE Transactions on Multime'd. 9(5), 923–938 (2007)

    Article  Google Scholar 

  20. Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. ACM international conference on Multimedia (2010)

  21. Shen, J., Cheng, Z.: Personalized video similarity measure. Multime'd. Syst. 17(5), 421–433 (2011)

    Article  MathSciNet  Google Scholar 

  22. Typke, R., Wiering, F., Veltkamp, R.: A survey of music information retrieval systems. In:Proceedings of ISMIR (2005)

  23. Wolf, L., Hassner, T., Taigman, Y.: Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE Trans. Pattern Anal. Mach. Intell 33(10), 1978–1990 (2011)

    Article  Google Scholar 

  24. Xia, F., Liu, T., Wang, J., Zhang, W., Li, H.: Listwise approach to learning to rank—theory and algorithm. In: Proceedings of the 25th international conference on Machine learning (2008)

  25. Xu, J., Li, H.: Adarank: A boosting algorithm for information retrieval. The 30th Annual International ACM SIGIR Conference (2007)

  26. Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. ACM International Conference on Multimedia pp. 175–184 (2009)

  27. Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multime'd.10(3), 437–446 (2008)

    Article  Google Scholar 

  28. Zhai, X., Peng, Y., Xiao, J.: Cross-modality correlation propagation for cross-media retrieval. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2012)

  29. Zhai, X., Peng, Y., Xiao, J.: Effective heterogeneous similarity measure with nearest neighbors for cross-media retrieval. In: International Conference on MultiMedia Modeling (MMM) (2012)

  30. Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. Advances in Neural Information Processing Systems (NIPS) (2003)

  31. Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multime'd. 10(2), 221–229 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grant 61073084, Beijing Natural Science Foundation of China under Grant 4122035, National Hi-Tech Research and Development Program (863 Program) of China under Grant 2012AA012503, National Development and Reform Commission High-tech Program of China under Grant [2010]3044, and National Key Technology Research and Development Program of China under Grants 2012BAH07B01 and 2012BAH18B03.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuxin Peng.

Additional information

Communicated by B. Prabhakaran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhai, X., Peng, Y. & Xiao, J. Cross-media retrieval by intra-media and inter-media correlation mining. Multimedia Systems 19, 395–406 (2013). https://doi.org/10.1007/s00530-012-0297-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-012-0297-6

Keywords

Navigation