Skip to main content

Multi-modal Image Re-ranking with Autoencoders and Click Semantics

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

  • 2951 Accesses

Abstract

Image re-ranking is effective in improving text-based image retrieval experience. However, to construct an efficacious algorithm to achieve such a target is limited by two important issues: one is that visual features extracted for image re-ranking from images are too superficial to represent the whole information contained within images; the other is that the corresponding text information often mismatches semantics of images. In this paper, we utilize autoencoders to extract deeper features of images and exploit click data to bridge the semantic gap between query words and image semantics. A graph-based algorithm(MIR-AC) is proposed to adaptively integrate features from autoencoders and click information by constructing two manifolds with updating weights. In particular, MIR-AC completes image re-ranking by conducting an iterative optimization process in which image ranking scores and weights of manifolds are updated alternatively. Experiments are conducted on a real world dataset and results demonstrate that MIR-AC outperforms given state-of-arts in image re-ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (surf). Computer Vis. Image Underst. 110, 346–359 (2008)

    Article  Google Scholar 

  2. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)

    Article  Google Scholar 

  3. Cai, X., Han, G., Xiao, S.: An image registration method based on similarity of edge information. In: Proceedings of the IEEE International Symposium on Industrial Electronics, pp. 1111–1115. IEEE Press, May 2012

    Google Scholar 

  4. Carterette, B., Jones, R.: Evaluating search engines by modeling the relationship between relevance and clicks. In: Advances in Neural Information Processing Systems, pp. 217–224. MIT Press (2009)

    Google Scholar 

  5. Chen, M., Weinberger, K.Q., Sha, F., Bengio, Y.: Marginalized denoising autoencoders for nonlinear representations. In: IEEE International Conference on Machine Learning, pp. 1476–1484. IEEE (2014)

    Google Scholar 

  6. Cheng, B., Liu, G., Wang, J., Huang, Z., Yan, S.: Multi-task low-rank affinity pursuit for image segmentation. In: IEEE International Conference on Computer Vision, pp. 2439–2446. IEEE (2011)

    Google Scholar 

  7. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 886–893. IEEE Press (2005)

    Google Scholar 

  8. G. Smith, H.A.: Evaluating implicit judgments from image search interactions. In: WebSci 2009: Society On-line (2009)

    Google Scholar 

  9. Gao, Y., Wang, M., Tao, D., Ji, R., Dai, Q.: 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans. Image Process. 21, 4290–4303 (2012)

    Article  MathSciNet  Google Scholar 

  10. Hong, C., Yu, J., Li, J., Chen, X.: Multi-view hypergraph learning by patch alignment framework. Neurocomputing 118(22), 79–86 (2013)

    Article  Google Scholar 

  11. Hua, X.S., Yang, L., Wang, J., Wang, J., Ye, M., Wang, K., Rui, Y., Li, J.: Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: ACM International Conference on Multimedia, pp. 243–252. ACM (2013)

    Google Scholar 

  12. Lowe, D.G.: Distinctive image features from scale-invariant key points. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  13. Baghshah, M.S., Shouraki, S.B.: Metric learning for semi-supervised clustering using pairwise constraints and the geometrical structure of data. Intell. Data Anal. 13, 887–899 (2009)

    Google Scholar 

  14. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MATH  MathSciNet  Google Scholar 

  15. Szummer, M., Jaakkola, T.: Partially labeled classification with markov random walks. In: Advances in Neural Information Processing Systems, vol. 14, pp. 945–952. MIT Press (2001)

    Google Scholar 

  16. Wang, M., Hua, X., Yuan, X., Song, Y., Dai, L.: Optimizing multi-graph learning: towards a unified video annotation scheme. In: International Conference on MultiMedia Modelling, pp. 862–871. ACM MM (2007)

    Google Scholar 

  17. Wang, M., Hua, X.S., Hong, R., Tang, J., Qi, G.J., Song, Y.: Unified video annotation via multigraph learning. IEEE Trans. Circuits Syst. Video Technol. 19, 733–746 (2009)

    Article  Google Scholar 

  18. Wang, M., Li, G., Lu, Z., Gao, Y., Chua, T.S.: When amazon meets google: product visualization by exploring multiple information sources. ACM Trans. Internet Technol. 12(4), 12 (2013)

    Article  Google Scholar 

  19. Wang, M., Li, H., Tao, D., Lu, K., Wu, X.: Multimodal graph-based reranking for web image search. IEEE Trans. Image Process. 21(11), 4649–4661 (2012)

    Article  MathSciNet  Google Scholar 

  20. Wang, M., Yang, K., Hua, X.S., Zhang, H.J.: Towards a relevant and diverse search of social images. IEEE Trans. Multimedia 12, 829–842 (2010)

    Article  Google Scholar 

  21. Wu, M., Scholkopf, B.: Transductive classification via local learning regularization. In: International Conference on Artificial Intelligence and Statistics, pp. 628–635. Microtome (2007)

    Google Scholar 

  22. Yoshua, B.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

    Article  MATH  Google Scholar 

  23. Yu, J., Liu, D., Tao, D., Seah, H.S.: On combining multiple features for cartoon character retrieval and clip synthesis. IEEE Trans. Syst. Man Cybern. Part B 42(5), 1413–1427 (2012)

    Article  Google Scholar 

  24. Yu, J., Rui, Y., Chen, B.: Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans. Multimedia 16(1), 159–168 (2014)

    Article  Google Scholar 

  25. Zhang, T., Tao, D., Li, X., Yang, J.: Patch alignment for dimensionality reduction. IEEE Trans. Knowl. Data Eng. 21, 1299–1313 (2009)

    Article  Google Scholar 

  26. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, vol. 16, pp. 321–328. MIT Press (2004)

    Google Scholar 

  27. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: International Conference on Machine Learning, vol. 2, pp. 912–919. ACM, August 2003

    Google Scholar 

Download references

Acknowledgement

This work is supported by the Natural Science Foundation of China (No. 61202145, No. 61300192, No. 61472110), the Program for New Century Excellent Talents in University (No. NECT-12-0323), Zhejiang Provincial Natural Science Foundation of China for Distinguished Young Scholars (No. LR15F020002), the Natural Science Foundation of Fujian Province of China under Grants (No. 2014J01256), and the education and research Foundation of Fujian Province of China under Grants (No. JB14082, JB12252S).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaoqun Hong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Tang, C., Zhu, Q., Hong, C., Yu, J. (2016). Multi-modal Image Re-ranking with Autoencoders and Click Semantics. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27671-7_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27670-0

  • Online ISBN: 978-3-319-27671-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics