Skip to main content
Log in

Estimating the visual variety of concepts by referring to Web popularity

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Increasingly sophisticated methods for data processing demand knowledge on the semantic relationship between language and vision. New fields of research like Explainable AI demand to step away from black-boxed approaches and understanding how the underlying semantics of data sets and AI models work. Advancements in Psycholinguistics suggest, that there is a relationship from language perception to how language production and sentence creation work. In this paper, a method to measure the visual variety of concepts is proposed to quantify the semantic gap between vision and language. For this, an image corpus is recomposed using ImageNet and Web data. Web-based metrics for measuring the popularity of sub-concepts are used as a weighting to ensure that the image composition in a dataset is as natural as possible. Using clustering methods, a score describing the visual variety of each concept is determined. A crowd-sourced survey is conducted to create ground-truth values applicable for this research. The evaluations show that the recomposed image corpus largely improves the measured variety compared to previous datasets. The results are promising and give additional knowledge about the relationship of language and vision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Barnard K, Duygulu P, Forsyth D, de Freitas N, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135. https://doi.org/10.1162/153244303322533214

    MATH  Google Scholar 

  2. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-Up Robust Features (SURF). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  3. Buhrmester M, Kwang T, Gosling SD (2011) Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspect Psychol Sci 6(1):3–5. https://doi.org/10.1177/1745691610393980

    Article  Google Scholar 

  4. Comaniciu D, Meer P (2002) Mean Shift: A robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619. https://doi.org/10.1109/34.1000236

    Article  Google Scholar 

  5. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Proceedings of ECCV 2004 Workshop on Statistical Learning in Computer Vision, pp 1–22

  6. Davies M (2008) The corpus of contemporary American English: 520 million words, 1990–present. http://corpus.byu.edu/coca/

  7. Deng JDJ, Dong WDW, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: A large-scale hierarchical image database. In: Proceedings of 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2–9. https://doi.org/10.1109/CVPR.2009.5206848

  8. Divvala SK, Farhadi A, Guestrin C (2014) Learning everything about anything: Webly-supervised visual concept learning. In: Proceedings 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 3270–3277. https://doi.org/10.1109/CVPR.2014.412

  9. Dodge Y (2008) Spearman rank correlation coefficient. In: The Concise Encyclopedia of Statistics. Springer, New York, pp 502–505,. https://doi.org/10.1007/978-0-387-32833-1_379

  10. Google (2016) Google Custom Search API. https://developers.google.com/custom-search/

  11. Hentschel C, Sack H (2015) What image classifiers really see —visualizing bag-of-visual words models. In: Advances in Multimedia Modeling: 21st International Conference on Multimedia Modeling Processing. Springer, Lecture Notes in Computer Science, vol 8935, pp 95–104. https://doi.org/10.1007/978-3-319-14445-0_9

    Google Scholar 

  12. Holzinger A, Biemann C, Pattichis CS, Kell DB (2017) What do we need to build explainable AI systems for the medical domain?. Computer Research Repository arXiv:http://arXiv.org/abs/1712.09923

  13. Holzinger A, Malle B, Kieseberg P, Roth PM, Müller H, Reihs R, Zatloukal K (2017) Towards the augmented pathologist: challenges of explainable-AI in digital pathology. Computer Research Repository arXiv:http://arXiv.org/abs/1712.06657

  14. Inoue N, Shinoda K (2016) Adaptation of word vectors using tree structure for visual semantics. In: Proceedings of 24th ACM Multimedia Conference, pp 277–281. https://doi.org/10.1145/2964284.2967226

  15. Itseez (2015) Open source computer vision library. https://opencv.org/

  16. Kawakubo H, Akima Y, Yanai K (2010) Automatic construction of a folksonomy-based visual ontology. In: Proceedings of 2010 IEEE International Symposium on Multimedia, pp 330–335. https://doi.org/10.1109/ISM.2010.57

  17. Kennedy LS, Chang SF, Kozintsev IV (2006) To search or to label?: Predicting the performance of search-based automatic image classifiers. In: Proceedings of 8th ACM International Workshop on Multimedia Information Retrieval, pp 249–258. https://doi.org/10.1145/1178677.1178712

  18. Kilgarriff A, Baisa V, Bušta J, Jakubíček M, Kovávr V, Michelfeit J, Rychlý P, Suchomel V (2014) The sketch engine: Ten years on. Lexicography 1(1):7–36. https://doi.org/10.1007/s40607-014-0009-9

    Article  Google Scholar 

  19. Kohara Y, Yanai K (2013) Visual analysis of tag co-occurrence on nouns and adjectives. In: Li S, El Saddik A, Wang M, Mei T, Sebe N, Yan S, Hong R, Gurrin C (eds) Advances in Multimedia Modeling: 19th International Conference on Multimedia Modeling Processing, vol 7732. Springer, Lecture Notes in Computer Science, pp 47–57. https://doi.org/10.1007/978-3-642-35725-1-5

  20. van Leuken RH, Garcia L, Olivares X, van Zwol R (2009) Visual diversification of image search results. In: Proceedings of 18th International Conference on World Wide Web, pp 341–350. https://doi.org/10.1145/1526709.1526756

  21. Li JJ, Nenkova A (2015) Fast and accurate prediction of sentence specificity. In: Proceedings of 29th AAAI Conference on Artificial Intelligence, pp 2281–2287

  22. Loper E, Bird S (2002) NLTK: The Natural Language Toolkit. In: Proceedings of ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, vol 1, pp 63–70. https://doi.org/10.3115/1118108.1118117

  23. Maystre L (2017) Choix —Inference algorithms for models based on Luce’s choice axiom. https://github.com/lucasmaystre/choix/

  24. Merriam-Webster (2017) Merriam-Webster Online Dictionary. http://www.merriam-webster.com/

  25. Microsoft (2016) Microsoft Azure Bing Search API. https://azure.microsoft.com/ja-jp/services/cognitive-services/search/

  26. Miller GA (1995) WordNet: A lexical database for English, vol 38. https://doi.org/10.1145/219717.219748

    Article  Google Scholar 

  27. Nagasawa Y, Nakamura K, Nitta N, Babaguchi N (2017) Effect of junk images on inter-concept distance measurement: Positive or negative? In: Advances in Multimedia Modeling: 23rd International Conference on Multimedia Modeling Procs., Springer, Lecture Notes in Computer Science, vol 10133, pp 173–184. https://doi.org/10.1007/978-3-319-51814-5_15

    Google Scholar 

  28. Nakamura K, Babaguchi N (2015) Inter-concept distance measurement with adaptively weighted multiple visual features. In: Computer Vision — ACCV 2014 Workshops. Springer, Lecture Notes in Computer Science, vol 9010, pp 56–70. https://doi.org/10.1007/978-3-319-16634-6_5

    Google Scholar 

  29. Oxford University Press (2017) OED Online. https://en.oxforddictionaries.com/

  30. Paivio A, Yuille JC, Madigan SA (1968) Concreteness, imagery, and meaningfulness values for 925 nouns. J Exp Psychol 76(1):1–25

    Article  Google Scholar 

  31. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  32. Samek W, Wiegand T, Mueller KR (2017) Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. Computer Research Repository arXiv:http://arXiv.org/abs/1708.08296

  33. Smolik F, Kriz A (2015) The power of imageability: How the acquisition of inflected forms is facilitated in highly imageable verbs and nouns in Czech children. J First Lang 35(6):446–465. https://doi.org/10.1177/0142723715609228

    Article  Google Scholar 

  34. Thurstone LL (1927) The method of paired comparisons for social values. J Abnorm Psychol 21(4):384–400

    Google Scholar 

  35. Yahoo (2005) Flickr. https://www.flickr.com/

  36. Yanai K, Barnard K (2005) Image region entropy: A measure of “visualness” of Web images associated with one concept. In: Proceedings of 13th ACM Multimedia Conference, pp 419–422. https://doi.org/10.1145/1101149.1101241

Download references

Acknowledgements

We are grateful to Dr. Kazuaki Nakamura at Osaka University who provided expertise that greatly assisted this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc A. Kastner.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Parts of this research were supported by the Ministry of Education, Culture, Sports, Science and Technology, Grant-in-Aid for Scientific Research, and a joint research project with NII, Japan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kastner, M.A., Ide, I., Kawanishi, Y. et al. Estimating the visual variety of concepts by referring to Web popularity. Multimed Tools Appl 78, 9463–9488 (2019). https://doi.org/10.1007/s11042-018-6528-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6528-x

Keywords

Navigation