Abstract
In recent years, cross-modal methods have been extensively studied in the multimedia literature. Many existing cross-modal methods rely on labeled training data which is difficult to collect. In this paper we propose a cross-modal self-taught learning (CMSTL) algorithm which is learned from unlabeled multi-modal data. CMSTL adopts a two-stage self-taught scheme. In the multi-modal topic learning stage, both intra-modal similarity and multi-modal correlation are preserved. And different modalities have different weights to learn the mutli-modal topics. In the projection stage, soft assignment is used to learn projection functions. Experimental results on Wikipedia articles and NUS-WIDE show the effectiveness of CMSTL in both cross-modal retrieval and image hashing.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys (CSUR) 40(2), 5 (2008)
Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 175–184. ACM (2009)
Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the International Conference on Multimedia, pp. 251–260. ACM (2010)
Xie, L., Pan, P., Lu, Y.: A semantic model for cross-modal and multi-modal retrieval. In: Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, pp. 175–182. ACM (2013)
Lu, X., Wu, F., Tang, S., Zhang, Z., He, X., Zhuang, Y.: A low rank structural large margin method for cross-modal ranking. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 433–442. ACM (2013)
Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 International Conference on Management of Data, pp. 785–796. ACM (2013)
Hwang, S.J., Grauman, K.: Learning the relative importance of objects from tagged images for retrieval and cross-modal search. International Journal of Computer Vision 100(2), 134–153 (2012)
Zhang, D., Wang, J., Cai, D., Lu, J.: Self-taught hashing for fast similarity search. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25. ACM (2010)
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, p. 48. ACM (2009)
Li, D., Dimitrova, N., Li, M., Sethi, I.K.: Multimedia content processing through cross-modal association. In: Proceedings of the Eleventh ACM International Conference on Multimedia, pp. 604–611. ACM (2003)
Vinokourov, A., Cristianini, N., Shawe-Taylor, J.S.: Inferring a semantic representation of text via cross-language correlation analysis. In: Advances in Neural Information Processing Systems, pp. 1473–1480 (2002)
Hardoon, D., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16(12), 2639–2664 (2004)
Zhai, X., Peng, Y., Xiao, J.: Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval. In: AAAI (2013)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances in Neural Information Processing Systems, pp. 1753–1760 (2009)
Bronstein, M.M., Bronstein, A.M., Michel, F., Paragios, N.: Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: IEEE Conference on Cumputer Vision and Pattern Recognition (CVPR), pp. 3594–3601. IEEE (2010)
Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22(1), p. 1360 (2011)
Rasiwasia, N., Moreno, P., Vasconcelos, N.: Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia 9(5), 923–938 (2007)
Hotelling, H.: Relations between two sets of variates. Biometrika, 321–377 (1936)
Zhang, D., Wang, J., Cai, D., Lu, J.: Laplacian co-hashing of terms and documents. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 577–580. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Xie, L., Pan, P., Lu, Y., Jiang, S. (2015). Cross-Modal Self-Taught Learning for Image Retrieval. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8935. Springer, Cham. https://doi.org/10.1007/978-3-319-14445-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-14445-0_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14444-3
Online ISBN: 978-3-319-14445-0
eBook Packages: Computer ScienceComputer Science (R0)