Abstract
Subspace (i.e. image, text or latent subspace) learning is one of the essential parts in cross-media retrieval. And most of the existing methods deal with mapping different modalities to the latent subspace pre-defined by category labels. However, the labels need a lot of manual annotation, and the label concerned subspace may not be exact enough to represent the semantic information. In this paper, we propose a novel unsupervised concept learning approach in text subspace for cross-media retrieval, which can map images and texts to a conceptual text subspace via the neural networks trained by self-learned concept labels, therefore the well-established text subspace is more reasonable and practicable than pre-defined latent subspace. Experiments demonstrate that our proposed method not only outperforms the state-of-the-art unsupervised methods but achieves better performance than several supervised methods on two benchmark datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2007)
Costa, P.J., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G.R., Levy, R., Vasconcelos, N.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014)
Dong, J., Li, X., Snoek, C.G.M.: Word2VisualVec: Cross-media retrieval by visual feature prediction. arXiv (2016)
Fan, M., Wang, W., Wang, R.: Coupled feature mapping and correlation mining for cross-media retrieval. In: ICME Workshop (2016)
Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence Autoencoder. In: ACM MM, pp. 7–16 (2014)
Habibian, A., Mensink, T., Snoek, C.G.M.: Discovering semantic vocabularies for cross-media retrieval. In: ACM ICMR, pp. 131–138 (2015)
Han, L., Wang, W., Fan, M., Wang, R.: Cross-modality matching based on Fisher Vector with neural word embeddings and deep image features. In: ICASSP (2017)
Liang, J., Li, Z., Cao, D., He, R.: Self-paced cross-modal subspace matching. In: ACM SIGIR, pp. 569–578 (2016)
Marneffe, M.C.D., Maccartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC (2006)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: CoRR (2013)
Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. In: IJCAI (2016)
Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical Turk. In: NAACL Workshop (2010)
Rasiwasia, N., Costa Pereira, J., Coviello, E., et al.: A new approach to cross-modal multimedia retrieval. In: ACM MM (2010)
Rosipal, R., Krämer, N.: Overview and recent advances in partial least squares. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 34–51. Springer, Heidelberg (2006). https://doi.org/10.1007/11752790_2
Russakovsky, O., et al.: ImageNet Large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Srivastava, N., Salakhutdinov, R.: Learning representations for multimodal data with deep belief nets. In: ICML Workshop (2012)
Sun, C., Gan, C., Nevatia, R.: Automatic concept discovery from parallel text and visual corpora. In: IEEE International Conference on Computer Vision ICCV (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Computer Vision and Pattern Recognition CVPR (2015)
Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)
Wang, C., Yang, H., Meinel, C.: Deep semantic mapping for cross-modal retrieval. In: IEEE 27th International Conference on Tools with Artificial Intelligence ICTAI, pp. 1082–3409 (2015)
Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2010–2023 (2016)
Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: IEEE International Conference on Computer Vision ICCV (2013)
Wei, Y., Zhao, Y., Lu, C., Wei, S.: Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. Cybern. 47(2), 449–460 (2016)
Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition CVPR (2015)
Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: Proceedings of the 17th International Conference on Neural Information Processing Systems NIPS, pp. 1601–1608 (2004)
Acknowledgement
This project was supported by Shenzhen Peacock Plan (20130408-183003656), Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467), and Guangdong Science and Technology Project (2014B010117007).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Fan, M., Wang, W., Dong, P., Wang, R., Li, G. (2018). Unsupervised Concept Learning in Text Subspace for Cross-Media Retrieval. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10735. Springer, Cham. https://doi.org/10.1007/978-3-319-77380-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-77380-3_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77379-7
Online ISBN: 978-3-319-77380-3
eBook Packages: Computer ScienceComputer Science (R0)