Abstract
In this paper, a novel image tag recommendation framework is developed by fusing the deep multimodal feature representation and cross-modal correlation mining, which enables the most appropriate and relevant tags to be presented on the image and facilitates more accurate image retrieval. Such an image tag recommendation pattern can be modeled as an inter-related correlation distribution over deep multimodal visual and semantic representations of images and tags, in which the most important is to create more effective cross-modal correlation and measure what degree they are related. Our experiments on a large number of public data have obtained very positive results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Murthy, V.N.: Automatic image annotation using deep learning representations. University of Massachusetts, Amherst, MA, USA (2015)
Wang, W., Arora, R., Livescu, K., et al.: Unsupervised learning of acoustic features via deep canonical correlation analysis. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4590–4594. IEEE (2015)
Murthy, V.N., Can, E.F., Manmatha, R.: A hybrid model for automatic image annotation. In: Proceedings of International Conference on Multimedia Retrieval. ACM (2014)
Guillaumin, M., Mensink, T., Verbeek, J., et al.: TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 309–316. IEEE (2009)
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Jin, C., Mao, W., Zhang, R., et al.: Cross-modal image clustering via canonical correlation analysis. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Gong, Y., Jia, Y., Leung, T., et al.: Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894
Makadia, A., Pavlovic, V., Kumar, S.: A new baseline for image annotation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 316–329. Springer, Heidelberg (2008)
Andrew, G., Arora, R., Bilmes, J., et al.: Deep canonical correlation analysis. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1247–1255 (2013)
Wang, W., Arora, R., Livescu, K., et al.: On deep multi-view representation learning. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-2015), pp. 1083–1092 (2014)
Sigurbjörnsson, B., Van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web, pp. 327–336. ACM (2008)
Murthy, V.N., Can, E.F., Manmatha, R.A.: A hybrid model for automatic image annotation. In: Proceedings of International Conference on Multimedia Retrieval. ACM (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing (2014)
Thompson, B.: Canonical correlation analysis. In: Encyclopedia of statistics in behavioral science (2005)
Acknowledgments
This work is supported by the National Key Research and Development Plan (Grant No. 2016YFC0801003). Yuejie Zhang is the corresponding author.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Zhang, X., Jin, C., Zhang, Y., Zhang, T. (2016). Image Tag Recommendation via Deep Cross-Modal Correlation Mining. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-47674-2_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)