Zero-Shot Cross-Media Retrieval with External Knowledge

Chi, Jingze; Huang, Xin; Peng, Yuxin

doi:10.1007/978-981-10-8530-7_20

Zero-Shot Cross-Media Retrieval with External Knowledge

Jingze Chi¹²,
Xin Huang¹² &
Yuxin Peng¹²

Conference paper
First Online: 01 March 2018

1459 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 819))

Abstract

Cross-media retrieval has drawn much attention recently, by which users can retrieve results across different media types like image and text. The existing methods mainly focus on the condition where the training data covers all the categories in the testing data. However, the number of categories is infinite in real world and it is impossible to include all categories in the training data. Due to the limitation of scalability, the performance of existing methods will be not effective when retrieving with unseen categories. For addressing the issues of both “heterogeneity gap” and the gap of seen and unseen categories, this paper proposes a new approach to model both multimedia and external knowledge information. The common semantic representations are generated jointly by media features and category weight vectors which are learned by utilizing online encyclopedias. Experiment on two widely-used datasets shows the effectiveness of our approach for zero-shot cross-media retrieval.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 107.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://www.tensorflow.org.

References

Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2(1), 1–9 (2006)
Article Google Scholar
Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: ACM International Conference on Multimedia (ACM MM), pp. 251–260 (2010)
Google Scholar
Ranjan, V., Rasiwasia, N., Jawahar, C.: Multi-label cross-modal retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4094–4102 (2015)
Google Scholar
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: ACM International Conference on Multimedia (ACM MM), pp. 604–611 (2003)
Google Scholar
Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3846–3853 (2016)
Google Scholar
Wei, Y., Zhao, Y., Lu, C., Wei, S., Liu, L., Zhu, Z., Yan, S.: Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. Cybern. (TCYB) 47(2), 449–460 (2017)
Google Scholar
Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: ACM International Conference on Multimedia (ACM MM), pp. 7–16 (2014)
Google Scholar
Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Annual Conference on Neural Information Processing Systems (NIPS), pp. 1410–1418 (2009)
Google Scholar
Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 646–651 (2008)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(3), 453–465 (2014)
Article Google Scholar
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 819–826 (2013)
Google Scholar
Elhoseiny, M., Saleh, B., Elgammal, A.: Write a classifier: zero-shot learning using purely textual descriptions. In: IEEE International Conference on Computer Vision (ICCV), pp. 2584–2591 (2014)
Google Scholar
Ba, J.L., Swersky, K., Fidler, S., Salakhutdinov, R.: Predicting deep zero-shot convolutional neural networks using textual descriptions. In: IEEE International Conference on Computer Vision (ICCV), pp. 4247–4255 (2016)
Google Scholar
Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 24(6), 965–978 (2014)
Article Google Scholar
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International Conference on Machine Learning (ICML), pp. 689–696 (2011)
Google Scholar
He, Y., Xiang, S., Kang, C., Wang, J., Pan, C.: Cross-modal retrieval via deep and bidirectional representation learning. IEEE Trans. Multimed. (TMM) 18(7), 1363–1377 (2016)
Article Google Scholar
Kankuekul, P., Kawewong, A., Tangruamsub, S., Hasegawa, O.: Online incremental attribute-based zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3657–3664 (2012)
Google Scholar
Wu, S., Bondugula, S., Luisier, F., Zhuang, X., Natarajan, P.: Zero-shot event detection using multi-modal fusion of weakly supervised concepts. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2665–2672 (2014)
Google Scholar
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1778–1785 (2009)
Google Scholar
Parikh, D., Grauman, K.: Relative attributes. In: IEEE International Conference on Computer Vision (ICCV), pp. 503–510 (2011)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Annual Conference on Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Google Scholar
Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2
Chapter Google Scholar
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Article MATH Google Scholar
Hardoon, D.R., Szedmák, S.R., Shawe-Taylor, J.R.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Article MATH Google Scholar

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grants 61371128 and 61532005.

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, Beijing, 100871, China
Jingze Chi, Xin Huang & Yuxin Peng

Authors

Jingze Chi
View author publications
You can also search for this author in PubMed Google Scholar
Xin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuxin Peng .

Editor information

Editors and Affiliations

Multimedia Communications Department, EURECOM, Sophia Antipolis, France
Benoit Huet
Shandong University , Qingdao, China
Liqiang Nie
Hefei University of Technology , Hefei, China
Richang Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chi, J., Huang, X., Peng, Y. (2018). Zero-Shot Cross-Media Retrieval with External Knowledge. In: Huet, B., Nie, L., Hong, R. (eds) Internet Multimedia Computing and Service. ICIMCS 2017. Communications in Computer and Information Science, vol 819. Springer, Singapore. https://doi.org/10.1007/978-981-10-8530-7_20

Download citation

DOI: https://doi.org/10.1007/978-981-10-8530-7_20
Published: 01 March 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8529-1
Online ISBN: 978-981-10-8530-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics