Analyzing semantic correlation for cross-modal retrieval

Xie, Liang; Pan, Peng; Lu, Yansheng

doi:10.1007/s00530-014-0397-6

Analyzing semantic correlation for cross-modal retrieval

Regular Paper
Published: 25 June 2014

Volume 21, pages 525–539, (2015)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Liang Xie¹,
Peng Pan¹ &
Yansheng Lu¹

531 Accesses
6 Citations
Explore all metrics

Abstract

With the development of multimedia technology, effective cross-modal retrieval methods are increasingly demanded. The key point of cross-modal retrieval is analyzing the correlation of heterogeneous modalities. There are mainly two types of correlation: content correlation and semantic correlation. Semantic correlation is constructed at a high level of abstraction which is more close to the human understanding than content correlation. In this paper, we investigate a semantic model to construct the semantic correlation for cross-modal retrieval. We assume that the semantic correlation of multimedia data from different modalities can be conditionally generated by semantic concepts in a probabilistic generation framework. The cross-modal semantic generation model (CMSGM) is proposed based on this assumption. We consider three cases of the cross-modal retrieval task. The first is the ideal case that all manifest concepts exist in training data for constructing the correlation, and we propose manifest CMSGM (M-CMSGM) which directly uses CMSGM on the manifest semantic concepts for retrieval. The second is the case that there are no manifest concepts in training data, and latent CMSGM (L-CMSGM) based on latent semantic concepts is proposed for this case, where the latent semantic concepts are learned by asymmetric spectral clustering. The last is the most general case that some of the manifest concepts exist, and we combine M-CMSGM and L-CMSGM to get combinative CMSGM (C-CMSGM) to solve this case. Experimental results on Wikipedia featured articles and MIR Flickr show that our methods have better performance compared with previous state-of-the-art methods. And C-CMSGM can maintain good performance in the case that manifest concepts are lacking, which confirms the robustness and practicality of C-CMSGM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Chandrika, P., Jawahar, C.V.: Multi modal semantic indexing for image retrieval. In: Proceedings of the ACM International Conference on Image and Video Retrieval, ACM, pp 342–349 (2010)
Wang, X.-J., et al.: Multi-model similarity propagation and its application for web image retrieval”. In: Proceedings of the 12th annual ACM international conference on Multimedia. ACM (2004)
Hoi, S.C.H., Lyu, M.R.: A multimodal and multilevel ranking scheme for large-scale video retrieval. IEEE Trans. Multimed. 10(4), 607–619 (2008)
Article Google Scholar
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Article MATH Google Scholar
Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. Advances in neural Information Processing Systems (2003)
Zhang, S., et al.: Automatic image annotation using group sparsity. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR Conference on Research and development in information retrieval. ACM (2003)
Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2. IEEE (2004)
Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval. ACM (2003)
Blei, David M., Ng, Andrew Y., Jordan, Michael I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Monay, F., Gatica-Perez, D.: Modeling semantic aspects for cross-media image indexing. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1802–1817 (2007)
Article Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)
Article MATH Google Scholar
Grangier, D., Bengio, S.: A discriminative kernel-based approach to rank images from text queries. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1371–1384 (2008)
Article Google Scholar
Hertz, T., Bar-Hillel, A., Weinshall, D.: Learning distance functions for image retrieval. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (. CVPR), vol. 2. IEEE (2004)
Makadia, A., Pavlovic, V., Kumar, S.: A new baseline for image annotation. In: Conference on Computer Vision, ECCV 2008. Springer, Berlin, pp. 316–329
Guillaumin, M., et al.: Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. IEEE 12th International Conference on Computer Vision. IEEE (2009)
Yang, Y., et al.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimed. 10(3), 437–446 (2008)
Article Google Scholar
Kidron, E., Schechner, Y.Y., Elad, M.: Pixels that sound. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1. IEEE (2005)
Hwang, S.J., Grauman, K.: Learning the relative importance of objects from tagged images for retrieval and cross-modal search. Int. J. Comput. Vis. 100(2), 134–153 (2012)
Article MathSciNet Google Scholar
Lai, P.L., Fyfe, C.: Kernel and nonlinear canonical correlation analysis. Int. J. Neural Syst. 10(05), 365–377 (2000)
Article Google Scholar
Zhuang, Y.-T., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimed. 10(2), 221–229 (2008)
Article Google Scholar
Yang, Y., et al.: Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on Multimedia. ACM (2009)
Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the international conference on Multimedia. ACM (2010)
Chu, W.-T., Chen, H.-Y.: Toward better retrieval and presentation by exploring cross-media correlations. Multimed. Syst. 10(3), 183–198 (2005)
Article MathSciNet Google Scholar
Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: 2011 IEEE International Conference on Computer Vision (ICCV). IEEE (2011)
Caicedo, J.C., et al.: Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing 76(1), 50–60 (2012)
Article MathSciNet Google Scholar
Gao, Y., et al.: Visual–textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)
Article MathSciNet Google Scholar
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Article MathSciNet Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Article Google Scholar
Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Article Google Scholar
Huiskes, M.J., Lew, M.S.: The MIR Flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. ACM (2008)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2. IEEE (2006)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 1. IEEE (2005)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article MATH Google Scholar
Xie, Liang, Pan, Peng, Lu, Yansheng.: A semantic model for cross-modal and multi-modal retrieval. Proceedings of the 3rd ACM conference on International conference on multimedia retrieval. ACM (2013)
Lu, Z., Ip, H.H.S., Peng, Y..: Exhaustive and efficient constraint propagation: a semi-supervised learning perspective and its applications. arXiv preprint arXiv:1109.4684 (2011)
Zhai, X., Peng, Y., Xiao, J.: Cross-media retrieval by intra-media and inter-media correlation mining. Multimedia Systems: 1–12
Atrey, P.K., Anwar Hossain, M., Saddik, AEl, Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimed. Syst. 16(6), 345–379 (2010)
Article Google Scholar
Jiang, S., Song, X., Huang, Q.: Relative image similarity learning with contextual information for Internet cross-media retrieval. Multimed. Syst. 1–13 (2013)
Yang, Y., Ma, Z., Hauptmann, A., Sebe, N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimed. 15(3), 661–669 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Liang Xie, Peng Pan & Yansheng Lu

Authors

Liang Xie
View author publications
You can also search for this author in PubMed Google Scholar
Peng Pan
View author publications
You can also search for this author in PubMed Google Scholar
Yansheng Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Pan.

Additional information

Communicated by L. Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, L., Pan, P. & Lu, Y. Analyzing semantic correlation for cross-modal retrieval. Multimedia Systems 21, 525–539 (2015). https://doi.org/10.1007/s00530-014-0397-6

Download citation

Received: 11 September 2013
Accepted: 02 June 2014
Published: 25 June 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s00530-014-0397-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing semantic correlation for cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

CAESAR: concept augmentation based semantic representation for cross-modal retrieval

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

A Mutual Information-Based Disentanglement Framework for Cross-Modal Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analyzing semantic correlation for cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

CAESAR: concept augmentation based semantic representation for cross-modal retrieval

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

A Mutual Information-Based Disentanglement Framework for Cross-Modal Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation