Abstract
In order to improve the retrieval accuracy of image retrieval systems, research focus has been shifted from designing sophisticated low-level feature extraction algorithms to combining image retrieval processing with rich semantics and knowledge-based methods. In this paper, we aim at improving text-based image retrieval for complex natural language queries by using a semantic parser (Knowledge Parser or K-Parser). From text written in natural language, the K-parser extracts a graphical semantic representation of the objects involved, their properties as well as their relations. We analyze both the image textual captions and the natural language queries with the K-parser. As a technical solution, we leverage RDF in two ways: first, we store the parsed image captions as RDF triples; second, we translate image queries into SPARQL queries. When applied to the Flickr8k dataset with a set of 16 custom queries, we notice that the K-parser exhibits some biases that negatively affect the accuracy of the queries. We propose two techniques to address the weaknesses: (1) we introduce a set of rules to transform the output of K-parser and fix some basic, recurrent parsing mistakes that occur on the captions of Flickr8k; (2) we leverage two popular commonsense knowledge databases, ConceptNet and WordNet, to raise the accuracy of queries on broad concepts. Using those two techniques, we can fix most of the initial retrieval errors, and accurately execute our set of 16 queries on the Flickr8k dataset.
Similar content being viewed by others
Notes
Turtle (Terse RDF Triple Language) can display RDF triples in a concise format.
In our work, we only consider the relation “CapableOf” from ConceptNet.
A method can refer to http://cs.stanford.edu/people/karpathy/deepimagesent/rankingdemo/
In our work, we consider that “run on grass” and “run through grass” have the same meaning.
References
Aditya S, Yang Y, Baral C, Fermuller C, Aloimonos Y (2015) From images to sentences through scene description graphs using commonsense reasoning and knowledge. arXiv:151103292
Chen H, Trouve A, Murakami KJ, Fukuda A (2016) An intelligent annotation-based image retrieval system based on rdf descriptions. Comput Electr Eng
Clark P, Porter B, Works BP (2004) Km–the knowledge machine 2.0: Users manual. Department of Computer Science, University of Texas at Austin 2:5
Dasiopoulou S, Giannakidou E, Litos G, Malasioti P, Kompatsiaris Y (2011) A survey of semantic image and video annotation tools. In: Knowledge-driven multimedia information extraction and ontology evolution. Springer, pp 196–239
Grobe M (2009) Rdf, jena, sparql and the ‘semantic web’. In: Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaboration. ACM, pp 131–138
Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899
Hsu MH, Tsai MF, Chen HH (2006) Query expansion with conceptnet and wordnet: an intrinsic comparison. In: Asia information retrieval symposium. Springer, pp 1–13
Im DH, Park GD (2015) Linked tag: image annotation using semantic relationships between image tags. Multimedia Tools Appl 74(7):2273–2287
Johnson J, Krishna R, Stark M, Li LJ, Shamma D, Bernstein M, Fei-Fei L (2015) Image retrieval using scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3668–3678
Li Y, Lu H, Li J, Li X, Li Y, Serikawa S (2016) Underwater image de-scattering and classification by deep neural network. Comput Electr Eng 54:68–77
Liu H, Singh P (2004) Conceptnet–a practical commonsense reasoning tool-kit. BT Technology Journal 22(4):211–226
Lu H, Li Y, Nakashima S, Serikawa S (2016) Single image dehazing through improved atmospheric light estimation. Multimedia Tools Appl 75(24):17,081–17,096
Magesh N, Thangaraj P (2011) Semantic image retrieval based on ontology and sparql query. In: Proceedings of International Journal of Computer Applications (IJCA)–ICACT, pp 12–16
Manola F, Miller E (2004) Resource description framework (rdf) primer. W3C Recommendation 10:5
McBride B, Boothby D, Dollin C (2004) An introduction to rdf and the jena rdf api. Retrieved August 1:2007
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38 (11):39–41
Prud E, Seaborne A (2006) Sparql query language for rdf. W3C Recommendation
Sankar S, Sayed A, Bani-Younis JA (2014) A schematic analysis on selective-rdf database stores. Int J Comput Appl 86(11)
Scherp A (2013) Semantic technologies for multimedia content: foundations and applications. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 1107–1108
Schuster S, Krishna R, Chang A, Fei-Fei L, Manning CD (2015) Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In: Proceedings of the fourth workshop on vision and language, pp 70–80
Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electr Eng 40(1):41–50
Sharma A, Vo NH, Aditya S, Baral C (2015) Towards addressing the winograd schema challenge-building and using a semantic parser and a knowledge hunting module. In: IJCAI, pp 1319–1325
Speer R, Havasi C (2012) Representing general relational knowledge in conceptnet 5. In: LREC, pp 3679–3686
Xu X, He L, Lu H, Shimadam A, Taniguchi R (2016) Non-linear matrix completion for social image tagging. IEEE Access
Xu X, He L, Shimada A, Taniguchi R, Lu H (2016) Learning unified binary codes for cross-modal retrieval via latent semantic hashing. Neurocomputing 213:191–203
Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process
Xu X, Shen F, Yang Y, Zhang D, Shen HT, Song J (2017) Matrix tri-factorization with manifold regularizations for zero-shot learning. In: Proceeding of the IEEE conference on computer vision and pattern recognition. CVPR
Yang Y, EDU U, Aloimonos Y, Fermuller C (2016) Deepiu: an architecture for image understanding. Adv Cogn Syst
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, H., Trouve, A., Murakami, K.J. et al. Semantic image retrieval for complex queries using a knowledge parser. Multimed Tools Appl 77, 10733–10751 (2018). https://doi.org/10.1007/s11042-017-4932-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4932-2