Elsevier

Computers & Electrical Engineering

Volume 58, February 2017, Pages 537-550
Computers & Electrical Engineering

An intelligent annotation-based image retrieval system based on RDF descriptions

https://doi.org/10.1016/j.compeleceng.2016.09.031Get rights and content

Highlights

  • The notions of concept and instance are proposed to express the semantics of images.

  • An image annotation model is proposed to annotate images at three levels.

  • An intelligent ABIR system is implemented based on RDF descriptions.

  • The problems of synonyms and homonyms are addressed in the our ABIR system.

  • The proposed ABIR system provides a way to search with calculation.

Abstract

In this paper, we aim at improving text-based image search using Semantic Web technologies. We introduce our notions of concept and instance in order to better express the semantics of images, and present an intelligent annotation-based image retrieval system. We test our approach on the Flickr8k dataset. From the provided captions, we generate annotations at three levels (sentence, concept and instance). These annotations are stored as RDF triples and can be queried to find images. The experimental results show that using concepts and instances to annotate images flexibly can improve the intelligence of the image retrieval system: (1) with annotations at concept level, it enables to create semantic links between concepts and then addresses many challenges, such as the problems of synonyms and homonyms; (2) with annotations at instance level, it can count things (e.g., “two people”, “three animals”) or identify a same concept.

Introduction

With the development of information technologies, a huge amount of digital images is being generated very rapidly. Consequently, how to quickly and accurately find relevant images has become a very hot research topic. Image retrieval solutions are generally classified into two types: content-based image retrieval (CBIR) solutions and annotation-based image retrieval (ABIR) solutions. Basically, CBIR solutions use visual features (such as color, texture, shape and object location) to retrieve images based on content properties. This technology has been widely used in many applications such as fingerprint identification, digital libraries, medicine and historical research, among others [1], [2]. All these applications are similar: from an input image, they boil down to finding the similar images from a collection of images. However, because the visual features cannot accurately represent the semantics of images, the CBIR solutions suffer from the semantic gap problem [3]. On the other hand, ABIR solutions use textual descriptions as image metadata and find images with text retrieval techniques. The purpose of image annotation is to narrow the semantic gap between image visual features and semantics [4]. The crucial challenge is how to find relevant images accurately and intelligently.

It is well known that Flickr1 is a famous image sharing website, which gathers millions of photos. It provides many kinds of image retrieval ways, including one with ABIR solution, which allows users to annotate and retrieve images by tags (a form of metadata). With this feature, users can find images easily and conveniently. However, in some cases, the retrieval results may be inaccurate. For example, when searching with “2 animals” or with “Apple” (a company name), the retrieval results are inaccurate; when searching with “TV” and with “television”, the retrieval results are different. The reasons may be as follows: (1) synonyms: “TV” and “television” have the same meaning, but machines cannot understand; (2) homonyms: it is unknown whether “Apple” is a kind of fruit or a company name; and (3) count things: it is unable to count things (e.g., “2 animals”). The problems of synonyms and homonyms have been mentioned many times by other researchers and have been solved in several ways. But the third problem is rarely discussed.

In this paper, we propose a new intelligent ABIR system. Our experiments are based on the Flickr8K [5] dataset, which consists of images extracted from the Flickr website, together with natural language sentences. We process those sentences and generate annotations for images at three levels of description: (1) sentence level: use natural language sentences, such as “two lizards fighting”; (2) concept level: use abstract or general concepts, such as “animal”, “cat” and “white cat”; and (3) instance level: use concrete and specific words, such as “1 cat”, “2 animals” and “a ‘my best buddy’ shirt”. We express annotations using Resource Description Framework (RDF) [6]. RDF is a foundation for processing metadata, it is designed to describe the resources and the relationships among them. Our main contributions are as follows: (1) we define our notions of concept and instance to describe images; (2) we propose an image annotation model to annotate images at three levels of description; (3) we count things (e.g., “2 animals”) intelligently in images with instances; and (4) we find things accurately by using unique identifiers of instances.

The rest of this paper is organized as follows. Section 2 briefly introduces the dataset and the basic technologies used in our work. Section 3 reviews some related works. In Section 4, we describe our image annotation model, the design of our system and evaluation methodology. Section 5 presents the results and discussion. Section 6 gives the conclusion and future work.

Section snippets

RDF

RDF is a standard data model to describe Web resources within the Semantic Web. By using RDF, we can define and use metadata vocabularies to make statements about resources; furthermore, we can also create links between different resources. A resource can be anything that is identifiable by a Uniform Resource Identifier (URI), the statements describe the properties of resources, and the links indicate the relationships between resources. RDF uses a graph data model, an RDF graph can be

Image annotation model

Most common search engines (e.g., Flickr, Google and Bing) use keyword-based search techniques. This approach is based on keyword annotations; each image is annotated by having a list of keywords associated with it [12]. With these keywords, images can be retrieved in an effective way. However, it has some disadvantages. For example, when searching with “animal”, images annotated with “dog” cannot be found.

In [13], [14], ontology-based image retrieval solution has been proposed to better

Proposed approach

Our proposed system is implemented in JAVA, it consists of two modules (as shown in Fig. 3): data preparation module and image query module. We first introduce our image annotation model in 4.1; and then describe the details of our system in 4.2 and 4.3; finally, evaluation methodology is presented in 4.4.

Results and discussion

In this section, we present the retrieval results of our experiments. We list and analyze the results in 5.1 and then discuss the results in 5.2.

Conclusion and future work

In this paper we proposed an image annotation model with our notions of concept and instance, and presented an intelligent annotation-based image retrieval system based on RDF descriptions, with annotations at three levels.

The experimental results show that our system can retrieve images in an intelligent way with flexible annotations of concept and instance. Our method addresses the problems of synonyms and homonyms; furthermore, it can count things or identify a same kind of thing with a

Hua Chen is a PhD student at Kyushu University, Japan. He received the B.E. degree from Beihang University in 2005, China. Then he worked as an engineer in NTT DATA Corporation and Fujitsu Kyushu Network Technologies Limited for seven years. He joined the Institute of Systems, Information Technologies and Nanotechnologies (ISIT) as an associate researcher from 2015.

References (24)

  • T. Dharani et al.

    A survey on content based image retrieval

    Pattern recognition, informatics and mobile engineering (PRIME), 2013 international conference on

    (2013)
  • L. Chu et al.

    Robust spatial consistency graph model for partial duplicate image retrieval

    IEEE Trans Multimedia

    (2013)
  • ZhangX. et al.

    Improving image tags by exploiting web search results

    Multimedia Tools Appl

    (2013)
  • XuX. et al.

    Learning multi-task local metrics for image annotation

    Multimedia Tools Appl

    (2016)
  • M. Hodosh et al.

    Framing image description as a ranking task: Data, models and evaluation metrics

    J Artif Intell Res

    (2013)
  • Manola F., Miller E., McBride B. Resource description framework (rdf) primer. w3c recommendation, 10 february 2004....
  • Beckett D., Berners-Lee T., Prud’hommeaux E. Turtle-terse rdf triple language. W3C Team Submission...
  • Prud’Hommeaux E., Seaborne A. Sparql query language for rdf. W3C recommendation...
  • P. Bard et al.

    The sesame project: an overview and main results

    Proc. 13th world conf. Earth. engng., Vancouver

    (2004)
  • M. Grobe

    Rdf, jena, sparql and the “semantic web”

    Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaboration

    (2009)
  • O. Erling et al.

    Rdf support in the virtuoso dbms

    Networked knowledge-networked media

    (2009)
  • XuX. et al.

    Image annotation with incomplete labelling by modelling image specific structured loss

    IEEJ Trans Electr Electronic Eng

    (2016)
  • Hua Chen is a PhD student at Kyushu University, Japan. He received the B.E. degree from Beihang University in 2005, China. Then he worked as an engineer in NTT DATA Corporation and Fujitsu Kyushu Network Technologies Limited for seven years. He joined the Institute of Systems, Information Technologies and Nanotechnologies (ISIT) as an associate researcher from 2015.

    Antoine Trouve received the PhD degree from Kyushu University in 2011 (Japan), the M.E degree from the ENSEIRB-MATMECA, Bordeaux in 2006 (France). He worked as a researcher at ISIT from 2007 to 2014. Then he worked as an assistant professor at Kyushu University from 2014 to 2016. He is now working for the company AIBOD, Japan.

    Kazuaki J Murakami received his PhD degree, M.E. degree and B.E degree from Kyoto University, Japan. He worked in Kyushu University from 1987, and joined the Institute of Systems, Information Technologies and Nanotechnologies (ISIT) from 2001. Now he is the vice-director of the ISIT, and also an honorary professor of Kyushu University.

    Akira Fukuda received his PhD degree, M.E. degree and B.E degree from Kyushu University. He worked in NTT Corporation Musashino Laboratory from 1979 to 1983. Then he worked in Kyushu University from 1983 to 1994. He joined the Nara Institute of Science and Technology as a professor from 1994 to 2001. Now is a professor in Kyushu University.

    View full text