Unsupervised Concept Learning in Text Subspace for Cross-Media Retrieval

Fan, Mengdi; Wang, Wenmin; Dong, Peilei; Wang, Ronggang; Li, Ge

doi:10.1007/978-3-319-77380-3_48

Mengdi Fan¹⁹,
Wenmin Wang¹⁹,
Peilei Dong¹⁹,
Ronggang Wang¹⁹ &
…
Ge Li¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10735))

Included in the following conference series:

Pacific Rim Conference on Multimedia

Abstract

Subspace (i.e. image, text or latent subspace) learning is one of the essential parts in cross-media retrieval. And most of the existing methods deal with mapping different modalities to the latent subspace pre-defined by category labels. However, the labels need a lot of manual annotation, and the label concerned subspace may not be exact enough to represent the semantic information. In this paper, we propose a novel unsupervised concept learning approach in text subspace for cross-media retrieval, which can map images and texts to a conceptual text subspace via the neural networks trained by self-learned concept labels, therefore the well-established text subspace is more reasonable and practicable than pre-defined latent subspace. Experiments demonstrate that our proposed method not only outperforms the state-of-the-art unsupervised methods but achieves better performance than several supervised methods on two benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 155.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2007)
Google Scholar
Costa, P.J., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G.R., Levy, R., Vasconcelos, N.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014)
Article Google Scholar
Dong, J., Li, X., Snoek, C.G.M.: Word2VisualVec: Cross-media retrieval by visual feature prediction. arXiv (2016)
Google Scholar
Fan, M., Wang, W., Wang, R.: Coupled feature mapping and correlation mining for cross-media retrieval. In: ICME Workshop (2016)
Google Scholar
Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence Autoencoder. In: ACM MM, pp. 7–16 (2014)
Google Scholar
Habibian, A., Mensink, T., Snoek, C.G.M.: Discovering semantic vocabularies for cross-media retrieval. In: ACM ICMR, pp. 131–138 (2015)
Google Scholar
Han, L., Wang, W., Fan, M., Wang, R.: Cross-modality matching based on Fisher Vector with neural word embeddings and deep image features. In: ICASSP (2017)
Google Scholar
Liang, J., Li, Z., Cao, D., He, R.: Self-paced cross-modal subspace matching. In: ACM SIGIR, pp. 569–578 (2016)
Google Scholar
Marneffe, M.C.D., Maccartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC (2006)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: CoRR (2013)
Google Scholar
Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. In: IJCAI (2016)
Google Scholar
Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical Turk. In: NAACL Workshop (2010)
Google Scholar
Rasiwasia, N., Costa Pereira, J., Coviello, E., et al.: A new approach to cross-modal multimedia retrieval. In: ACM MM (2010)
Google Scholar
Rosipal, R., Krämer, N.: Overview and recent advances in partial least squares. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 34–51. Springer, Heidelberg (2006). https://doi.org/10.1007/11752790_2
Chapter Google Scholar
Russakovsky, O., et al.: ImageNet Large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Srivastava, N., Salakhutdinov, R.: Learning representations for multimodal data with deep belief nets. In: ICML Workshop (2012)
Google Scholar
Sun, C., Gan, C., Nevatia, R.: Automatic concept discovery from parallel text and visual corpora. In: IEEE International Conference on Computer Vision ICCV (2015)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Computer Vision and Pattern Recognition CVPR (2015)
Google Scholar
Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)
Article Google Scholar
Wang, C., Yang, H., Meinel, C.: Deep semantic mapping for cross-modal retrieval. In: IEEE 27th International Conference on Tools with Artificial Intelligence ICTAI, pp. 1082–3409 (2015)
Google Scholar
Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2010–2023 (2016)
Article Google Scholar
Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: IEEE International Conference on Computer Vision ICCV (2013)
Google Scholar
Wei, Y., Zhao, Y., Lu, C., Wei, S.: Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. Cybern. 47(2), 449–460 (2016)
Google Scholar
Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition CVPR (2015)
Google Scholar
Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: Proceedings of the 17th International Conference on Neural Information Processing Systems NIPS, pp. 1601–1608 (2004)
Google Scholar

Download references

Acknowledgement

This project was supported by Shenzhen Peacock Plan (20130408-183003656), Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467), and Guangdong Science and Technology Project (2014B010117007).

Author information

Authors and Affiliations

School of Electronic and Computer Engineering, Peking University, Lishui Road 2199, Nanshan District, Shenzhen, 518055, China
Mengdi Fan, Wenmin Wang, Peilei Dong, Ronggang Wang & Ge Li

Authors

Mengdi Fan
View author publications
You can also search for this author in PubMed Google Scholar
Wenmin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peilei Dong
View author publications
You can also search for this author in PubMed Google Scholar
Ronggang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ge Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenmin Wang .

Editor information

Editors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Bing Zeng
University of Chinese Academy of Sciences, Beijing, China
Qingming Huang
University of Ottawa, Ottawa, Ontario, Canada
Abdulmotaleb El Saddik
University of Electronic Science and Technology of China, Chengdu, China
Hongliang Li
Chinese Academy of Sciences, Beijing, China
Shuqiang Jiang
Harbin Institute of Technology, Harbin, China
Xiaopeng Fan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, M., Wang, W., Dong, P., Wang, R., Li, G. (2018). Unsupervised Concept Learning in Text Subspace for Cross-Media Retrieval. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10735. Springer, Cham. https://doi.org/10.1007/978-3-319-77380-3_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-77380-3_48
Published: 10 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77379-7
Online ISBN: 978-3-319-77380-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics