Multi-Task Multi-modal Semantic Hashing for Web Image Retrieval with Limited Supervision

Xie, Liang; Zhu, Lei; Cheng, Zhiyong

doi:10.1007/978-3-319-51811-4_38

Multi-Task Multi-modal Semantic Hashing for Web Image Retrieval with Limited Supervision

Liang Xie¹⁸,
Lei Zhu¹⁹ &
Zhiyong Cheng²⁰

Conference paper
First Online: 31 December 2016

3279 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10132))

Abstract

As an important element of social media, social images become more and more important to our daily life. Recently, smart hashing scheme has been emerging as a promising approach to support fast social image search. Leveraging semantic labels have shown effectiveness for hashing. However, semantic labels tend to be limited in terms of quantity and quality. In this paper, we propose Multi-Task Multi-modal Semantic Hashing (MTMSH) to index large scale social image data collection with limited supervision. MTMSH improves search accuracy via improving more semantic information from two aspects. First, latent multi-modal structure among labeled and unlabeled data, is explored by Multiple Anchor Graph Learning (MAGL) to enhance the quantity of semantic information. In addition, multi-task based Share Hash Space Learning (SHSL) is proposed to improve the semantic quality. Further, MGAL and SHSL are integrated using a joint framework, where semantic function and hash functions mutually reinforce each other. Then, an alternating algorithm, whose time complexity is linear to the size of training data, is also proposed. Experimental results on two large scale real world image datasets demonstrate the effectiveness and efficiency of MTMSH.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Meng, L., Tan, A.H., Leung, C., et al.: Online multimodal co-indexing and retrieval of weakly labeled web image collections. In: ICMR 2015, pp. 219–226 (2015)
Google Scholar
Pereira, J.C., Coviello, E., Doyle, G., et al.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE TPAMI 36, 521–535 (2014)
Article Google Scholar
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS 2009 (2009)
Google Scholar
Liu, W., Wang, J., Ji, R., et al.: Supervised hashing with kernels. In: CVPR 2012 (2012)
Google Scholar
Gong, Y., Lazebnik, S., Gordo, A., et al.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE TPAMI 35, 2916–2929 (2013)
Article Google Scholar
Zhang, D., Li, W.J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI 2014 (2014)
Google Scholar
Cheng, J., Leng, C., Li, P., et al.: Semi-supervised multi-graph hashing for scalable similarity search. Comput. Vis. Image Underst. 124, 12–21 (2014)
Article Google Scholar
Liu, W., He, J., Chang, S.F.: Large graph construction for scalable semi-supervised learning. In: ICML 2010 (2010)
Google Scholar
Zhu, L., Shen, J., Jin, H., et al.: Content-based visual landmark search via multimodal hypergraph learning. IEEE Trans. Cybern. 45, 2756–2769 (2015)
Article Google Scholar
Bartels, R.H., Stewart, G.W.: Solution of the matrix equation AX + XB = C [F4]. Commun. ACM 15, 820–826 (1972)
Article Google Scholar
Murty, K.G., Yu, F.T.: Linear Complementarity, Linear and Nonlinear Programming. Heldermann, Berlin (1988)
MATH Google Scholar
Rasiwasia, N., Costa Pereira, J., Coviello, E., et al.: A new approach to cross-modal multimedia retrieval. In: ACM Multimedia (2010)
Google Scholar
Chua, T.S., Tang, J., Hong, R., et al.: NUS-WIDE: a real-world web image database from National University of Singapore. In: CIVR. ACM (2009)
Google Scholar
Xie, L., Zhu, L., Pan, P., et al.: Cross-modal self-taught hashing for large-scale image retrieval. Signal Process. 124, 81–92 (2016)
Article Google Scholar
Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. In: IJCAI 2011 (2011)
Google Scholar
Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: CVPR 2014 (2014)
Google Scholar
Song, J., Yang, Y., Yang, Y., et al.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: SIGMOD. ACM (2013)
Google Scholar
Lin, Z., Ding, G., Hu, M., et al.: Semantics-preserving hashing for cross-view retrieval. In: CVPR 2015 (2015)
Google Scholar
Irie, G., Arai, H., Taniguchi, Y.: Alternating co-quantization for cross-modal hashing. In: ICCV 2015 (2015)
Google Scholar
Nie, L., Yan, S., Wang, M., et al.: Harvesting visual concepts for image search with complex queries. In: MM 2012 (2012)
Google Scholar
Song, X., Nie, L., Zhang, L., et al.: Interest inference via structure-constrained multi-source multi-task learning. In: IJCAI 2015 (2015)
Google Scholar
Xie, L., Pan, P., Lu, Y., et al.: A cross-modal multi-task learning framework for image annotation. In: CIKM 2014 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Science, Wuhan University of Technology, Wuhan, China
Liang Xie
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia
Lei Zhu
School of Computing, National University of Singapore, Singapore, Singapore
Zhiyong Cheng

Authors

Liang Xie
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhu .

Editor information

Editors and Affiliations

CNRS–IRISA, Rennes, France
Laurent Amsaleg
Reykjavík University, Reykjavik, Iceland
Gylfi Þór Guðmundsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
Reykjavik University, Reykjavik, Ireland
Björn Þór Jónsson
National Institute of Informatics, Tokyo, Japan
Shin’ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, L., Zhu, L., Cheng, Z. (2017). Multi-Task Multi-modal Semantic Hashing for Web Image Retrieval with Limited Supervision. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10132. Springer, Cham. https://doi.org/10.1007/978-3-319-51811-4_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-51811-4_38
Published: 31 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51810-7
Online ISBN: 978-3-319-51811-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics