An intelligent annotation-based image retrieval system based on RDF descriptions
Graphical abstract
Introduction
With the development of information technologies, a huge amount of digital images is being generated very rapidly. Consequently, how to quickly and accurately find relevant images has become a very hot research topic. Image retrieval solutions are generally classified into two types: content-based image retrieval (CBIR) solutions and annotation-based image retrieval (ABIR) solutions. Basically, CBIR solutions use visual features (such as color, texture, shape and object location) to retrieve images based on content properties. This technology has been widely used in many applications such as fingerprint identification, digital libraries, medicine and historical research, among others [1], [2]. All these applications are similar: from an input image, they boil down to finding the similar images from a collection of images. However, because the visual features cannot accurately represent the semantics of images, the CBIR solutions suffer from the semantic gap problem [3]. On the other hand, ABIR solutions use textual descriptions as image metadata and find images with text retrieval techniques. The purpose of image annotation is to narrow the semantic gap between image visual features and semantics [4]. The crucial challenge is how to find relevant images accurately and intelligently.
It is well known that Flickr1 is a famous image sharing website, which gathers millions of photos. It provides many kinds of image retrieval ways, including one with ABIR solution, which allows users to annotate and retrieve images by tags (a form of metadata). With this feature, users can find images easily and conveniently. However, in some cases, the retrieval results may be inaccurate. For example, when searching with “2 animals” or with “Apple” (a company name), the retrieval results are inaccurate; when searching with “TV” and with “television”, the retrieval results are different. The reasons may be as follows: (1) synonyms: “TV” and “television” have the same meaning, but machines cannot understand; (2) homonyms: it is unknown whether “Apple” is a kind of fruit or a company name; and (3) count things: it is unable to count things (e.g., “2 animals”). The problems of synonyms and homonyms have been mentioned many times by other researchers and have been solved in several ways. But the third problem is rarely discussed.
In this paper, we propose a new intelligent ABIR system. Our experiments are based on the Flickr8K [5] dataset, which consists of images extracted from the Flickr website, together with natural language sentences. We process those sentences and generate annotations for images at three levels of description: (1) sentence level: use natural language sentences, such as “two lizards fighting”; (2) concept level: use abstract or general concepts, such as “animal”, “cat” and “white cat”; and (3) instance level: use concrete and specific words, such as “1 cat”, “2 animals” and “a ‘my best buddy’ shirt”. We express annotations using Resource Description Framework (RDF) [6]. RDF is a foundation for processing metadata, it is designed to describe the resources and the relationships among them. Our main contributions are as follows: (1) we define our notions of concept and instance to describe images; (2) we propose an image annotation model to annotate images at three levels of description; (3) we count things (e.g., “2 animals”) intelligently in images with instances; and (4) we find things accurately by using unique identifiers of instances.
The rest of this paper is organized as follows. Section 2 briefly introduces the dataset and the basic technologies used in our work. Section 3 reviews some related works. In Section 4, we describe our image annotation model, the design of our system and evaluation methodology. Section 5 presents the results and discussion. Section 6 gives the conclusion and future work.
Section snippets
RDF
RDF is a standard data model to describe Web resources within the Semantic Web. By using RDF, we can define and use metadata vocabularies to make statements about resources; furthermore, we can also create links between different resources. A resource can be anything that is identifiable by a Uniform Resource Identifier (URI), the statements describe the properties of resources, and the links indicate the relationships between resources. RDF uses a graph data model, an RDF graph can be
Image annotation model
Most common search engines (e.g., Flickr, Google and Bing) use keyword-based search techniques. This approach is based on keyword annotations; each image is annotated by having a list of keywords associated with it [12]. With these keywords, images can be retrieved in an effective way. However, it has some disadvantages. For example, when searching with “animal”, images annotated with “dog” cannot be found.
In [13], [14], ontology-based image retrieval solution has been proposed to better
Proposed approach
Our proposed system is implemented in JAVA, it consists of two modules (as shown in Fig. 3): data preparation module and image query module. We first introduce our image annotation model in 4.1; and then describe the details of our system in 4.2 and 4.3; finally, evaluation methodology is presented in 4.4.
Results and discussion
In this section, we present the retrieval results of our experiments. We list and analyze the results in 5.1 and then discuss the results in 5.2.
Conclusion and future work
In this paper we proposed an image annotation model with our notions of concept and instance, and presented an intelligent annotation-based image retrieval system based on RDF descriptions, with annotations at three levels.
The experimental results show that our system can retrieve images in an intelligent way with flexible annotations of concept and instance. Our method addresses the problems of synonyms and homonyms; furthermore, it can count things or identify a same kind of thing with a
Hua Chen is a PhD student at Kyushu University, Japan. He received the B.E. degree from Beihang University in 2005, China. Then he worked as an engineer in NTT DATA Corporation and Fujitsu Kyushu Network Technologies Limited for seven years. He joined the Institute of Systems, Information Technologies and Nanotechnologies (ISIT) as an associate researcher from 2015.
References (24)
- et al.
A survey on content based image retrieval
Pattern recognition, informatics and mobile engineering (PRIME), 2013 international conference on
(2013) - et al.
Robust spatial consistency graph model for partial duplicate image retrieval
IEEE Trans Multimedia
(2013) - et al.
Improving image tags by exploiting web search results
Multimedia Tools Appl
(2013) - et al.
Learning multi-task local metrics for image annotation
Multimedia Tools Appl
(2016) - et al.
Framing image description as a ranking task: Data, models and evaluation metrics
J Artif Intell Res
(2013) - Manola F., Miller E., McBride B. Resource description framework (rdf) primer. w3c recommendation, 10 february 2004....
- Beckett D., Berners-Lee T., Prud’hommeaux E. Turtle-terse rdf triple language. W3C Team Submission...
- Prud’Hommeaux E., Seaborne A. Sparql query language for rdf. W3C recommendation...
- et al.
The sesame project: an overview and main results
Proc. 13th world conf. Earth. engng., Vancouver
(2004) Rdf, jena, sparql and the “semantic web”
Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaboration
(2009)
Rdf support in the virtuoso dbms
Networked knowledge-networked media
Image annotation with incomplete labelling by modelling image specific structured loss
IEEJ Trans Electr Electronic Eng
Cited by (10)
Introduction to the special section on Artificial Intelligence and Computer Vision
2017, Computers and Electrical EngineeringInformation Retrieval in XML Document: State of the Art
2024, Lecture Notes in Networks and SystemsGraph-based image retrieval: State of the art
2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Ontology-Based Semantic Modeling of Knowledge in Construction: Classification and Identification of Hazards Implied in Images
2020, Journal of Construction Engineering and ManagementMulti-modal approach with deep embedded clustering for social image retrieval
2019, International Journal of Advanced Science and Technology
Hua Chen is a PhD student at Kyushu University, Japan. He received the B.E. degree from Beihang University in 2005, China. Then he worked as an engineer in NTT DATA Corporation and Fujitsu Kyushu Network Technologies Limited for seven years. He joined the Institute of Systems, Information Technologies and Nanotechnologies (ISIT) as an associate researcher from 2015.
Antoine Trouve received the PhD degree from Kyushu University in 2011 (Japan), the M.E degree from the ENSEIRB-MATMECA, Bordeaux in 2006 (France). He worked as a researcher at ISIT from 2007 to 2014. Then he worked as an assistant professor at Kyushu University from 2014 to 2016. He is now working for the company AIBOD, Japan.
Kazuaki J Murakami received his PhD degree, M.E. degree and B.E degree from Kyoto University, Japan. He worked in Kyushu University from 1987, and joined the Institute of Systems, Information Technologies and Nanotechnologies (ISIT) from 2001. Now he is the vice-director of the ISIT, and also an honorary professor of Kyushu University.
Akira Fukuda received his PhD degree, M.E. degree and B.E degree from Kyushu University. He worked in NTT Corporation Musashino Laboratory from 1979 to 1983. Then he worked in Kyushu University from 1983 to 1994. He joined the Nara Institute of Science and Technology as a professor from 1994 to 2001. Now is a professor in Kyushu University.