Skip to main content
Log in

Semantic image retrieval for complex queries using a knowledge parser

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In order to improve the retrieval accuracy of image retrieval systems, research focus has been shifted from designing sophisticated low-level feature extraction algorithms to combining image retrieval processing with rich semantics and knowledge-based methods. In this paper, we aim at improving text-based image retrieval for complex natural language queries by using a semantic parser (Knowledge Parser or K-Parser). From text written in natural language, the K-parser extracts a graphical semantic representation of the objects involved, their properties as well as their relations. We analyze both the image textual captions and the natural language queries with the K-parser. As a technical solution, we leverage RDF in two ways: first, we store the parsed image captions as RDF triples; second, we translate image queries into SPARQL queries. When applied to the Flickr8k dataset with a set of 16 custom queries, we notice that the K-parser exhibits some biases that negatively affect the accuracy of the queries. We propose two techniques to address the weaknesses: (1) we introduce a set of rules to transform the output of K-parser and fix some basic, recurrent parsing mistakes that occur on the captions of Flickr8k; (2) we leverage two popular commonsense knowledge databases, ConceptNet and WordNet, to raise the accuracy of queries on broad concepts. Using those two techniques, we can fix most of the initial retrieval errors, and accurately execute our set of 16 queries on the Flickr8k dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.flickr.com

  2. http://kparser.org

  3. Refer to http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html

  4. Turtle (Terse RDF Triple Language) can display RDF triples in a concise format.

  5. More detailed experiments are presented in Sections 6 and 7.

  6. In our work, we only consider the relation “CapableOf” from ConceptNet.

  7. A method can refer to http://cs.stanford.edu/people/karpathy/deepimagesent/rankingdemo/

  8. In our work, we consider that “run on grass” and “run through grass” have the same meaning.

References

  1. Aditya S, Yang Y, Baral C, Fermuller C, Aloimonos Y (2015) From images to sentences through scene description graphs using commonsense reasoning and knowledge. arXiv:151103292

  2. Chen H, Trouve A, Murakami KJ, Fukuda A (2016) An intelligent annotation-based image retrieval system based on rdf descriptions. Comput Electr Eng

  3. Clark P, Porter B, Works BP (2004) Km–the knowledge machine 2.0: Users manual. Department of Computer Science, University of Texas at Austin 2:5

  4. Dasiopoulou S, Giannakidou E, Litos G, Malasioti P, Kompatsiaris Y (2011) A survey of semantic image and video annotation tools. In: Knowledge-driven multimedia information extraction and ontology evolution. Springer, pp 196–239

  5. Grobe M (2009) Rdf, jena, sparql and the ‘semantic web’. In: Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaboration. ACM, pp 131–138

  6. Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899

    MathSciNet  MATH  Google Scholar 

  7. Hsu MH, Tsai MF, Chen HH (2006) Query expansion with conceptnet and wordnet: an intrinsic comparison. In: Asia information retrieval symposium. Springer, pp 1–13

  8. Im DH, Park GD (2015) Linked tag: image annotation using semantic relationships between image tags. Multimedia Tools Appl 74(7):2273–2287

    Article  Google Scholar 

  9. Johnson J, Krishna R, Stark M, Li LJ, Shamma D, Bernstein M, Fei-Fei L (2015) Image retrieval using scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3668–3678

  10. Li Y, Lu H, Li J, Li X, Li Y, Serikawa S (2016) Underwater image de-scattering and classification by deep neural network. Comput Electr Eng 54:68–77

    Article  Google Scholar 

  11. Liu H, Singh P (2004) Conceptnet–a practical commonsense reasoning tool-kit. BT Technology Journal 22(4):211–226

    Article  Google Scholar 

  12. Lu H, Li Y, Nakashima S, Serikawa S (2016) Single image dehazing through improved atmospheric light estimation. Multimedia Tools Appl 75(24):17,081–17,096

    Article  Google Scholar 

  13. Magesh N, Thangaraj P (2011) Semantic image retrieval based on ontology and sparql query. In: Proceedings of International Journal of Computer Applications (IJCA)–ICACT, pp 12–16

  14. Manola F, Miller E (2004) Resource description framework (rdf) primer. W3C Recommendation 10:5

    Google Scholar 

  15. McBride B, Boothby D, Dollin C (2004) An introduction to rdf and the jena rdf api. Retrieved August 1:2007

  16. Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38 (11):39–41

    Article  Google Scholar 

  17. Prud E, Seaborne A (2006) Sparql query language for rdf. W3C Recommendation

  18. Sankar S, Sayed A, Bani-Younis JA (2014) A schematic analysis on selective-rdf database stores. Int J Comput Appl 86(11)

  19. Scherp A (2013) Semantic technologies for multimedia content: foundations and applications. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 1107–1108

  20. Schuster S, Krishna R, Chang A, Fei-Fei L, Manning CD (2015) Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In: Proceedings of the fourth workshop on vision and language, pp 70–80

  21. Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electr Eng 40(1):41–50

    Article  Google Scholar 

  22. Sharma A, Vo NH, Aditya S, Baral C (2015) Towards addressing the winograd schema challenge-building and using a semantic parser and a knowledge hunting module. In: IJCAI, pp 1319–1325

  23. Speer R, Havasi C (2012) Representing general relational knowledge in conceptnet 5. In: LREC, pp 3679–3686

  24. Xu X, He L, Lu H, Shimadam A, Taniguchi R (2016) Non-linear matrix completion for social image tagging. IEEE Access

  25. Xu X, He L, Shimada A, Taniguchi R, Lu H (2016) Learning unified binary codes for cross-modal retrieval via latent semantic hashing. Neurocomputing 213:191–203

    Article  Google Scholar 

  26. Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process

  27. Xu X, Shen F, Yang Y, Zhang D, Shen HT, Song J (2017) Matrix tri-factorization with manifold regularizations for zero-shot learning. In: Proceeding of the IEEE conference on computer vision and pattern recognition. CVPR

  28. Yang Y, EDU U, Aloimonos Y, Fermuller C (2016) Deepiu: an architecture for image understanding. Adv Cogn Syst

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Trouve, A., Murakami, K.J. et al. Semantic image retrieval for complex queries using a knowledge parser. Multimed Tools Appl 77, 10733–10751 (2018). https://doi.org/10.1007/s11042-017-4932-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4932-2

Keywords

Navigation