Abstract
Cross-media retrieval has drawn much attention recently, by which users can retrieve results across different media types like image and text. The existing methods mainly focus on the condition where the training data covers all the categories in the testing data. However, the number of categories is infinite in real world and it is impossible to include all categories in the training data. Due to the limitation of scalability, the performance of existing methods will be not effective when retrieving with unseen categories. For addressing the issues of both “heterogeneity gap” and the gap of seen and unseen categories, this paper proposes a new approach to model both multimedia and external knowledge information. The common semantic representations are generated jointly by media features and category weight vectors which are learned by utilizing online encyclopedias. Experiment on two widely-used datasets shows the effectiveness of our approach for zero-shot cross-media retrieval.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2(1), 1–9 (2006)
Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: ACM International Conference on Multimedia (ACM MM), pp. 251–260 (2010)
Ranjan, V., Rasiwasia, N., Jawahar, C.: Multi-label cross-modal retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4094–4102 (2015)
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: ACM International Conference on Multimedia (ACM MM), pp. 604–611 (2003)
Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3846–3853 (2016)
Wei, Y., Zhao, Y., Lu, C., Wei, S., Liu, L., Zhu, Z., Yan, S.: Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. Cybern. (TCYB) 47(2), 449–460 (2017)
Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: ACM International Conference on Multimedia (ACM MM), pp. 7–16 (2014)
Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Annual Conference on Neural Information Processing Systems (NIPS), pp. 1410–1418 (2009)
Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 646–651 (2008)
Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(3), 453–465 (2014)
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 819–826 (2013)
Elhoseiny, M., Saleh, B., Elgammal, A.: Write a classifier: zero-shot learning using purely textual descriptions. In: IEEE International Conference on Computer Vision (ICCV), pp. 2584–2591 (2014)
Ba, J.L., Swersky, K., Fidler, S., Salakhutdinov, R.: Predicting deep zero-shot convolutional neural networks using textual descriptions. In: IEEE International Conference on Computer Vision (ICCV), pp. 4247–4255 (2016)
Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 24(6), 965–978 (2014)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International Conference on Machine Learning (ICML), pp. 689–696 (2011)
He, Y., Xiang, S., Kang, C., Wang, J., Pan, C.: Cross-modal retrieval via deep and bidirectional representation learning. IEEE Trans. Multimed. (TMM) 18(7), 1363–1377 (2016)
Kankuekul, P., Kawewong, A., Tangruamsub, S., Hasegawa, O.: Online incremental attribute-based zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3657–3664 (2012)
Wu, S., Bondugula, S., Luisier, F., Zhuang, X., Natarajan, P.: Zero-shot event detection using multi-modal fusion of weakly supervised concepts. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2665–2672 (2014)
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1778–1785 (2009)
Parikh, D., Grauman, K.: Relative attributes. In: IEEE International Conference on Computer Vision (ICCV), pp. 503–510 (2011)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Annual Conference on Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_2
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Hardoon, D.R., Szedmák, S.R., Shawe-Taylor, J.R.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Acknowledgments
This work was supported by National Natural Science Foundation of China under Grants 61371128 and 61532005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chi, J., Huang, X., Peng, Y. (2018). Zero-Shot Cross-Media Retrieval with External Knowledge. In: Huet, B., Nie, L., Hong, R. (eds) Internet Multimedia Computing and Service. ICIMCS 2017. Communications in Computer and Information Science, vol 819. Springer, Singapore. https://doi.org/10.1007/978-981-10-8530-7_20
Download citation
DOI: https://doi.org/10.1007/978-981-10-8530-7_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8529-1
Online ISBN: 978-981-10-8530-7
eBook Packages: Computer ScienceComputer Science (R0)