Skip to main content

The Latent Semantic Power of Labels: Improving Image Classification via Natural Language Semantic

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11956))

Abstract

In order to address the problem that numerical labels are difficult to optimize, one-hot encoding is introduced into image classification tasks, and has been widely used in current models based on CNNs. However, one-hot encoding neglects the textual semantics of class labels, which closely relate to image characteristics and contain latent connections between images. Inspired by distributional similarity based representations in Natural Language Processing society, we propose a framework by introducing Word2Vec into classic CNN models to improve image classification performance. By mining the latent semantic power of classes labels, word vector representations participate in the classification model instead of the traditional one-hot encoding. In the evaluation experiments implemented on data sets of CIFAR-10 and CIFAR-100, a series of representative CNNs have been tested as the feature extraction component for our framework. Experimental results show that the proposed method has revealed compelling ability to improve the classification accuracy.

This research was supported by Project 61672474, 61501412 supported by NSFC. National Science and Technology Major Project (No. 2017ZX05036-001-010). Science and Technology Planning Project of Guangdong Province, China. (No. 2018B020207012).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2016)

    Article  Google Scholar 

  2. Bengio, Y.: Neural net language models. Scholarpedia 3(1), 3881 (2008)

    Article  Google Scholar 

  3. Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)

  4. Chollet, F., et al.: Keras (2015)

    Google Scholar 

  5. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, vol. 1, pp. 1–2, Prague (2004)

    Google Scholar 

  6. De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)

    Article  MathSciNet  Google Scholar 

  7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)

    Google Scholar 

  8. Dixit, M., Chen, S., Gao, D., Rasiwasia, N., Vasconcelos, N.: Scene classification with semantic fisher vectors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2974–2983 (2015)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR abs/1502.01852 (2015)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  11. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization (2015)

    Google Scholar 

  12. Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  13. Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset (2014). http://www.cs.toronto.edu/kriz/cifar.html

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  15. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  16. Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Comput. Vis. 43(1), 29–44 (2001)

    Article  Google Scholar 

  17. Li, F.F., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)

    Article  Google Scholar 

  18. Li, X., Liao, S., Lan, W., Du, X., Yang, G.: Zero-shot image tagging by hierarchical semantic embedding. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 879–882. ACM (2015)

    Google Scholar 

  19. Lin, M., Chen, Q., Yan, S.: Network in network. CoRR abs/1312.4400 (2013)

    Google Scholar 

  20. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  22. Mikolov, T., Joulin, A., Chopra, S., Mathieu, M., Ranzato, M.: Learning longer memory in recurrent neural networks. arXiv preprint arXiv:1412.7753 (2014)

  23. Morgado, P., Vasconcelos, N.: Semantically consistent regularization for zero-shot recognition. In: CVPR, vol. 9, p. 10 (2017)

    Google Scholar 

  24. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Invest. 30(1), 3–26 (2007)

    Article  Google Scholar 

  25. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  26. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, May 2010. http://is.muni.cz/publication/884893/en

  27. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)

    Google Scholar 

  28. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  29. Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: Null, p. 1470. IEEE (2003)

    Google Scholar 

  30. Su, Y., Jurie, F.: Improving image classification using semantic attributes. Int. J. Comput. Vis. 100(1), 59–77 (2012)

    Article  Google Scholar 

  31. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  32. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1555–1565 (2014)

    Google Scholar 

  33. Tieleman, T., Hinton, G.: Rmsprop. Lecture, COURSERA (2012)

    Google Scholar 

  34. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)

    Article  Google Scholar 

  35. Wan, J., et al.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 157–166. ACM (2014)

    Google Scholar 

  36. Zhang, L., Xiang, T., Gong, S., et al.: Learning a deep embedding model for zero-shot learning (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Yao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jia, H., Yao, H., Tian, T., Yan, C., Li, S. (2019). The Latent Semantic Power of Labels: Improving Image Classification via Natural Language Semantic. In: Milošević, D., Tang, Y., Zu, Q. (eds) Human Centered Computing. HCC 2019. Lecture Notes in Computer Science(), vol 11956. Springer, Cham. https://doi.org/10.1007/978-3-030-37429-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37429-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37428-0

  • Online ISBN: 978-3-030-37429-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics