Abstract
Image tagging has attracted much research interest due to its wide applications. Many existing methods have gained impressive results, however, they have two main limitations: (1) only focus on tagging images, but ignore the tags’ influences on visual feature modeling. (2) model the tag correlation without considering visual contents of image. In this paper, we propose a joint visual-semantic propagation model (JVSP) to address these two issues. First, we leverage a joint visual-semantic modeling to harvest integrated features which can accurately reflect the relationship between tags and image regions. Second, we introduce a visual-guided LSTM to capture the co-occurrence relation of the tags. Third, we also design a diversity loss to enforce that our model learns to focus on different regions. Experimental results on three challenging datasets demonstrate that our proposed method leads to significant performance gains over existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sun, F., Tang, J., Li, H., Qi, G.J., Huang, T.S.: Multi-label image categorization with sparse factor representation. IEEE TIP 23(3), 1028–1037 (2014)
Liu, D., Yan, S., Rui, Y., Zhang, H.J.: Unified tag analysis with multi-edge graph. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 25–34 (2010)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009, pp. 248–255 (2009)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2015)
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, p. 48 (2009)
Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894 (2013)
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN-RNN: a unified framework for multi-label image classification. In: Proceedings of the IEEE CVPR, pp. 2285–2294 (2016)
Jin, J., Nakayama, H.: Annotation order matters: recurrent image annotator for arbitrary length image tagging. arXiv preprint arXiv:1604.05225 (2016)
Murthy, V.N., Maji, S., Manmatha, R.: Automatic image annotation using deep learning representations. In: Proceedings of the 5th ACM on ICMR, pp. 603–606 (2015)
Wang, H., Huang, H., Ding, C.: Image annotation using multi-label correlated green’s function. In: IEEE ICCV (2009)
Verma, Y., Jawahar, C.V.: Image annotation using metric learning in semantic neighbourhoods. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 836–849. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_60
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV, pp. 309–316 (2009)
Cao, X., Zhang, H., Guo, X., Liu, S., Meng, D.: SLED: semantic label embedding dictionary representation for multilabel image annotation. IEEE TIP 24(9), 2746–2759 (2015)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 319–326. ACM (2004)
Jia, X., Gavves, E., Fernando, B., Tuytelaars, T.: Guiding the long-short term memory model for image caption generation. In: ICCV, pp. 2407–2415 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Ma, Y., Zhu, X., Sun, Y., Yan, B. (2018). Image Tagging by Joint Deep Visual-Semantic Propagation. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10735. Springer, Cham. https://doi.org/10.1007/978-3-319-77380-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-77380-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77379-7
Online ISBN: 978-3-319-77380-3
eBook Packages: Computer ScienceComputer Science (R0)