Cross-Media Retrieval via Semantic Entity Projection

Huang, Lei; Peng, Yuxin

doi:10.1007/978-3-319-27671-7_23

Lei Huang¹⁹ &
Yuxin Peng¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

International Conference on Multimedia Modeling

2988 Accesses
3 Citations

Abstract

Cross-media retrieval is becoming increasingly important nowadays. To address this challenging problem, most existing approaches project heterogeneous features into a unified feature space to facilitate their similarity computation. However, this unified feature space usually has no explicit semantic meanings, which might ignore the hints contained in the original media content, and thus is not able to fully measure the similarities among different media types. By considering the above issues, we propose a new approach to cross-media retrieval via semantic entity projection (SEP) in this paper. Our approach consists of three main steps. Firstly, an entity level with fine-grained semantics between low-level features and high-level concepts are constructed, so as to help bridge the semantic gap to a certain extent. Then, an entity projection is learned by minimizing both cross-media correlation error and single-media reconstruction error from low-level features to the entity level, with which a unified feature space with explicit semantic meanings can be obtained from low-level features. Finally, the semantic abstraction of high-level concepts is generated by using logistic regression to conduct cross-media retrieval. Experimental results on the Wikipedia dataset show the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cheng, X., Roth, D.: Relational Inference for Wikification. In: EMNLP (2013)
Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the ACM International Conference on Multimedia, pp. 7–16 (2014)
Google Scholar
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: IEEE 12th International Conference on Computer Vision, pp. 309–316 (2009)
Google Scholar
Hotelling, H.: Relations between two sets of variates. Biometrika 42(1), 321–377 (1936)
Article Google Scholar
Jacobs, P.S.: Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Psychology Press, New York (2014)
Google Scholar
Jiang, Y., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pp. 494–501 (2007)
Google Scholar
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: Proceedings of the 11th ACM International Conference on Multimedia, pp. 604–611 (2003)
Google Scholar
Mahadevan, V., Wong, C.W., Pereira, J.C., Liu, T., Vasconcelos, N., Saul, L.K.: Maximum covariance unfolding: manifold learning for bimodal data. In: Advances in Neural Information Processing Systems, pp. 918–926 (2011)
Google Scholar
Peng, Y., Zhai, X., Zhao, Y., Huang, X.: Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Trans. Circ. Syst. Video Technol. (2015). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7036070&tag=1
Pereira, J.C., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G., Levy, R., Vasconcelos, N.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014)
Article Google Scholar
Qi, G.J., Aggarwal, C., Huang, T.: Towards semantic knowledge propagation from text corpus to web images. In: Proceedings of the 20th International Conference on World Wide Web, pp. 297–306 (2011)
Google Scholar
Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp. 251–260 (2010)
Google Scholar
Sharma, A., Kumar, A., Daume III, H., Jacobs, D.W.: Generalized multiview analysis: a discriminative latent space. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2160–2167 (2012)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int J. Comput. Vis. 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10(3), 437–446 (2008)
Article Google Scholar
Zhuang, Y., Wang, Y., Wu, F., Zhang, Y., Lu, W.: Supervised coupled dictionary learning with group structures for multi-modal retrieval. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence, pp. 1070–1076 (2013)
Google Scholar
Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10(2), 221–229 (2008)
Article Google Scholar

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grants 61371128 and 61532005, and National Hi-Tech Research and Development Program of China (863 Program) under Grants 2014AA015102 and 2012AA012503.

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, Beijing, 100871, China
Lei Huang & Yuxin Peng

Authors

Lei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuxin Peng .

Editor information

Editors and Affiliations

University of Texas at San Antonio, San Antonio, USA
Qi Tian
Dept. of Information Engineering, University of Trento, Povo, Trento, Italy
Nicu Sebe
EECS, University of Central Florida, Orlando, Florida, USA
Guo-Jun Qi
EURECOM, Sophia-Antipolis, France
Benoit Huet
Hefei University of Technology, Hefei, Anhui, China
Richang Hong
School of Computing and Information, Hefei University of Technology, Hefei, Anhui, China
Xueliang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, L., Peng, Y. (2016). Cross-Media Retrieval via Semantic Entity Projection. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-27671-7_23
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics