Skip to main content

Cross-Media Retrieval via Semantic Entity Projection

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

Abstract

Cross-media retrieval is becoming increasingly important nowadays. To address this challenging problem, most existing approaches project heterogeneous features into a unified feature space to facilitate their similarity computation. However, this unified feature space usually has no explicit semantic meanings, which might ignore the hints contained in the original media content, and thus is not able to fully measure the similarities among different media types. By considering the above issues, we propose a new approach to cross-media retrieval via semantic entity projection (SEP) in this paper. Our approach consists of three main steps. Firstly, an entity level with fine-grained semantics between low-level features and high-level concepts are constructed, so as to help bridge the semantic gap to a certain extent. Then, an entity projection is learned by minimizing both cross-media correlation error and single-media reconstruction error from low-level features to the entity level, with which a unified feature space with explicit semantic meanings can be obtained from low-level features. Finally, the semantic abstraction of high-level concepts is generated by using logistic regression to conduct cross-media retrieval. Experimental results on the Wikipedia dataset show the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010)

    Article  Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Cheng, X., Roth, D.: Relational Inference for Wikification. In: EMNLP (2013)

    Google Scholar 

  4. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)

    Google Scholar 

  5. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  6. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  7. Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the ACM International Conference on Multimedia, pp. 7–16 (2014)

    Google Scholar 

  8. Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: IEEE 12th International Conference on Computer Vision, pp. 309–316 (2009)

    Google Scholar 

  9. Hotelling, H.: Relations between two sets of variates. Biometrika 42(1), 321–377 (1936)

    Article  Google Scholar 

  10. Jacobs, P.S.: Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Psychology Press, New York (2014)

    Google Scholar 

  11. Jiang, Y., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pp. 494–501 (2007)

    Google Scholar 

  12. Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: Proceedings of the 11th ACM International Conference on Multimedia, pp. 604–611 (2003)

    Google Scholar 

  13. Mahadevan, V., Wong, C.W., Pereira, J.C., Liu, T., Vasconcelos, N., Saul, L.K.: Maximum covariance unfolding: manifold learning for bimodal data. In: Advances in Neural Information Processing Systems, pp. 918–926 (2011)

    Google Scholar 

  14. Peng, Y., Zhai, X., Zhao, Y., Huang, X.: Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Trans. Circ. Syst. Video Technol. (2015). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7036070&tag=1

  15. Pereira, J.C., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G., Levy, R., Vasconcelos, N.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014)

    Article  Google Scholar 

  16. Qi, G.J., Aggarwal, C., Huang, T.: Towards semantic knowledge propagation from text corpus to web images. In: Proceedings of the 20th International Conference on World Wide Web, pp. 297–306 (2011)

    Google Scholar 

  17. Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp. 251–260 (2010)

    Google Scholar 

  18. Sharma, A., Kumar, A., Daume III, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2160–2167 (2012)

    Google Scholar 

  19. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int J. Comput. Vis. 103(1), 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  20. Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10(3), 437–446 (2008)

    Article  Google Scholar 

  21. Zhuang, Y., Wang, Y., Wu, F., Zhang, Y., Lu, W.: Supervised coupled dictionary learning with group structures for multi-modal retrieval. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence, pp. 1070–1076 (2013)

    Google Scholar 

  22. Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10(2), 221–229 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grants 61371128 and 61532005, and National Hi-Tech Research and Development Program of China (863 Program) under Grants 2014AA015102 and 2012AA012503.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuxin Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Huang, L., Peng, Y. (2016). Cross-Media Retrieval via Semantic Entity Projection. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27671-7_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27670-0

  • Online ISBN: 978-3-319-27671-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics