Skip to main content

Zero-Shot Cross-Media Retrieval with External Knowledge

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 819))

Abstract

Cross-media retrieval has drawn much attention recently, by which users can retrieve results across different media types like image and text. The existing methods mainly focus on the condition where the training data covers all the categories in the testing data. However, the number of categories is infinite in real world and it is impossible to include all categories in the training data. Due to the limitation of scalability, the performance of existing methods will be not effective when retrieving with unseen categories. For addressing the issues of both “heterogeneity gap” and the gap of seen and unseen categories, this paper proposes a new approach to model both multimedia and external knowledge information. The common semantic representations are generated jointly by media features and category weight vectors which are learned by utilizing online encyclopedias. Experiment on two widely-used datasets shows the effectiveness of our approach for zero-shot cross-media retrieval.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   107.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.tensorflow.org.

References

  1. Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2(1), 1–9 (2006)

    Article  Google Scholar 

  2. Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: ACM International Conference on Multimedia (ACM MM), pp. 251–260 (2010)

    Google Scholar 

  3. Ranjan, V., Rasiwasia, N., Jawahar, C.: Multi-label cross-modal retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4094–4102 (2015)

    Google Scholar 

  4. Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: ACM International Conference on Multimedia (ACM MM), pp. 604–611 (2003)

    Google Scholar 

  5. Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3846–3853 (2016)

    Google Scholar 

  6. Wei, Y., Zhao, Y., Lu, C., Wei, S., Liu, L., Zhu, Z., Yan, S.: Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. Cybern. (TCYB) 47(2), 449–460 (2017)

    Google Scholar 

  7. Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: ACM International Conference on Multimedia (ACM MM), pp. 7–16 (2014)

    Google Scholar 

  8. Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Annual Conference on Neural Information Processing Systems (NIPS), pp. 1410–1418 (2009)

    Google Scholar 

  9. Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 646–651 (2008)

    Google Scholar 

  10. Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(3), 453–465 (2014)

    Article  Google Scholar 

  11. Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 819–826 (2013)

    Google Scholar 

  12. Elhoseiny, M., Saleh, B., Elgammal, A.: Write a classifier: zero-shot learning using purely textual descriptions. In: IEEE International Conference on Computer Vision (ICCV), pp. 2584–2591 (2014)

    Google Scholar 

  13. Ba, J.L., Swersky, K., Fidler, S., Salakhutdinov, R.: Predicting deep zero-shot convolutional neural networks using textual descriptions. In: IEEE International Conference on Computer Vision (ICCV), pp. 4247–4255 (2016)

    Google Scholar 

  14. Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 24(6), 965–978 (2014)

    Article  Google Scholar 

  15. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International Conference on Machine Learning (ICML), pp. 689–696 (2011)

    Google Scholar 

  16. He, Y., Xiang, S., Kang, C., Wang, J., Pan, C.: Cross-modal retrieval via deep and bidirectional representation learning. IEEE Trans. Multimed. (TMM) 18(7), 1363–1377 (2016)

    Article  Google Scholar 

  17. Kankuekul, P., Kawewong, A., Tangruamsub, S., Hasegawa, O.: Online incremental attribute-based zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3657–3664 (2012)

    Google Scholar 

  18. Wu, S., Bondugula, S., Luisier, F., Zhuang, X., Natarajan, P.: Zero-shot event detection using multi-modal fusion of weakly supervised concepts. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2665–2672 (2014)

    Google Scholar 

  19. Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1778–1785 (2009)

    Google Scholar 

  20. Parikh, D., Grauman, K.: Relative attributes. In: IEEE International Conference on Computer Vision (ICCV), pp. 503–510 (2011)

    Google Scholar 

  21. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)

    Google Scholar 

  22. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Annual Conference on Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)

    Google Scholar 

  23. Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2

    Chapter  Google Scholar 

  24. Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)

    Article  MATH  Google Scholar 

  25. Hardoon, D.R., Szedmák, S.R., Shawe-Taylor, J.R.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grants 61371128 and 61532005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuxin Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chi, J., Huang, X., Peng, Y. (2018). Zero-Shot Cross-Media Retrieval with External Knowledge. In: Huet, B., Nie, L., Hong, R. (eds) Internet Multimedia Computing and Service. ICIMCS 2017. Communications in Computer and Information Science, vol 819. Springer, Singapore. https://doi.org/10.1007/978-981-10-8530-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8530-7_20

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8529-1

  • Online ISBN: 978-981-10-8530-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics