Skip to main content

Image Tag Recommendation via Deep Cross-Modal Correlation Mining

  • Conference paper
  • First Online:
  • 1787 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10035))

Abstract

In this paper, a novel image tag recommendation framework is developed by fusing the deep multimodal feature representation and cross-modal correlation mining, which enables the most appropriate and relevant tags to be presented on the image and facilitates more accurate image retrieval. Such an image tag recommendation pattern can be modeled as an inter-related correlation distribution over deep multimodal visual and semantic representations of images and tags, in which the most important is to create more effective cross-modal correlation and measure what degree they are related. Our experiments on a large number of public data have obtained very positive results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Murthy, V.N.: Automatic image annotation using deep learning representations. University of Massachusetts, Amherst, MA, USA (2015)

    Google Scholar 

  2. Wang, W., Arora, R., Livescu, K., et al.: Unsupervised learning of acoustic features via deep canonical correlation analysis. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4590–4594. IEEE (2015)

    Google Scholar 

  3. Murthy, V.N., Can, E.F., Manmatha, R.: A hybrid model for automatic image annotation. In: Proceedings of International Conference on Multimedia Retrieval. ACM (2014)

    Google Scholar 

  4. Guillaumin, M., Mensink, T., Verbeek, J., et al.: TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 309–316. IEEE (2009)

    Google Scholar 

  5. Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)

    Article  MATH  Google Scholar 

  6. Jin, C., Mao, W., Zhang, R., et al.: Cross-modal image clustering via canonical correlation analysis. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  7. Gong, Y., Jia, Y., Leung, T., et al.: Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894

  8. Makadia, A., Pavlovic, V., Kumar, S.: A new baseline for image annotation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 316–329. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Andrew, G., Arora, R., Bilmes, J., et al.: Deep canonical correlation analysis. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1247–1255 (2013)

    Google Scholar 

  10. Wang, W., Arora, R., Livescu, K., et al.: On deep multi-view representation learning. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-2015), pp. 1083–1092 (2014)

    Google Scholar 

  11. Sigurbjörnsson, B., Van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web, pp. 327–336. ACM (2008)

    Google Scholar 

  12. Murthy, V.N., Can, E.F., Manmatha, R.A.: A hybrid model for automatic image annotation. In: Proceedings of International Conference on Multimedia Retrieval. ACM (2014)

    Google Scholar 

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  14. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing (2014)

    Google Scholar 

  15. Thompson, B.: Canonical correlation analysis. In: Encyclopedia of statistics in behavioral science (2005)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Key Research and Development Plan (Grant No. 2016YFC0801003). Yuejie Zhang is the corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuejie Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Zhang, X., Jin, C., Zhang, Y., Zhang, T. (2016). Image Tag Recommendation via Deep Cross-Modal Correlation Mining. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47674-2_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47673-5

  • Online ISBN: 978-3-319-47674-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics