A Multi-Directional Search technique for image annotation propagation

https://doi.org/10.1016/j.jvcir.2011.10.004Get rights and content

Abstract

Image annotation has attracted lots of attention due to its importance in image understanding and search areas. In this paper, we propose a novel Multi-Directional Search framework for semi-automatic annotation propagation. In this system, the user interacts with the system to provide example images and the corresponding annotations during the annotation propagation process. In each iteration, the example images are clustered and the corresponding annotations are propagated separately to each cluster: images in the local neighborhood are annotated. Furthermore, some of those images are returned to the user for further annotation. As the user marks more images, the annotation process goes into multiple directions in the feature space. The query movements can be treated as multiple path navigation. Each path could be further split based on the user’s input. In this manner, the system provides accurate annotation assistance to the user - images with the same semantic meaning but different visual characteristics can be handled effectively. From comprehensive experiments on Corel and U. of Washington image databases, the proposed technique shows accuracy and efficiency on annotating image databases.

Highlights

► In this paper, we proposed a Multi-Directional Search (MDS) technique. ► MDS leverages user RF to annotate images of same semantic meanings but different visual characters. ► As the query can be hierarchically split into sub-queries, the annotation speed increases in new rounds. ► MDS also considers user’s previous query intention to better exploit the intent of user feedback. ► MDS can handle uncaptioned database with unlimited vocabularies effectively by using user’s input.

Introduction

Due to the popularity of digital cameras and stronger communication infrastructure, digital images become widely available around the world. In order to provide better searching and sharing services of this massive amount of images, detailed and accurate annotations of images are required. However, manually annotating images is a tedious and expensive process. Automatic or semi-automatic annotation techniques are highly desirable [1], [2], [3], [6], [14], [17], [19], [26], [27], [30], [35].

Recently, most of the annotation researches focus on learning the mapping between images and keywords [1], [3]. However, only a very limited number of concepts can be modeled on a small-scale image database by learning the projections or correlations between images and keywords. It is not applicable to the web-scale image database with potentially unlimited vocabularies.

Another image annotation approach is based on the image labels of an annotated database [16], [35]. The basic idea is to build a initial image database with labels, the labels are from the images’ bounding text on a web page or a vocabulary. Then the labels are propagated to the visually similar images. Those images are from a content-based image retrieval or clustering based on the images’ visual features. A potential weakness of these approaches is that the bounding text can be incomplete and/or inconsistent compared to the real image content, especially on the spam pages. From this point of view, the bounding text is not always reliable.

From the above discussion we can see that to annotate a large scale image database with unlimited vocabularies, user’s Relevance Feedback (RF) should be utilized as a trust-worthy source for annotations. At the same time, an accurate CBIR technique with user’s RF should be utilized for efficient annotation propagation. CBIR with RF has been an active area of research [8], [10], [12], [24], [29] for the last decades. Essentially most of the existing techniques are based on the kNN (k nearest neighbor) model. Two images are considered similar if they are near each other in the feature space based on some distance measure. For each round of RF, the user’s identified relevant images are used to retrieve k nearest images for the next round of user feedback. Different techniques have been proposed for the computation of these k nearest images.

However, the standard kNN model, which is employed by today’s relevance feedback techniques, has one inherent weakness. Due to the semantic gap, semantically similar images might have different appearances and are not necessarily located within close proximity to each other in the feature space. Since traditional CBIR techniques focus on searching for the single best-matching neighborhood, their performance suffers when the relevant images are in multiple neighborhoods far apart from each other in the feature space. For instance, the user provides visually different but semantically identical example images, then those images should be treated separately in the CBIR process.

To address the aforementioned limitations, we propose a Multi-Directional Search (MDS) technique. In this system, a user provides sample images with labels as query for annotation propagation: first, local clustering is performed on the example images based on their visual features. If the user’s RF images have very different visual features (shows his intension to annotate groups of images with diverse visual characteristics), the current query is decomposed into multiple sub-queries accordingly to better benefit from the feedback information and provide more precise annotation. Similar to CBIR, the neighboring images of the multiple sub-queries will be returned to the user for further RF, and the sub-queries might be decomposed again. This mechanism enables the proposed technique to explore all relevant neighborhoods in order to cover the k best matching images wherever they might be, rather than just the top k ranked images of the single best-matching cluster (while omitting other relevant clusters). The label of the example images will be propagated to the top-ranked images in the result set. This iterative process is repeated until the user stops.

The primary contributions of this paper are as follows:

  • The MDS technique leverages user’s RF to annotate images with the same semantic meanings but different visual characters precisely and efficiently.

  • As the user’s query can be hierarchically split into multiple sub-queries, the annotation speed increases with new rounds.

  • With the user’s interaction, the annotations are propagated to the retrieved images near and on the query pathes. In this manner, the proposed technique can handle uncaptioned database with unlimited vocabularies effectively.

The remainder of this paper is organized as follows. The related works are discussed in Section 2. We give a system overview in Section 3. Then we explain the proposed Multiple Directions Search technique in Section 4. The annotation mechanism is describe in Section 5. The experimental results are discussed in Section 6. Finally, we conclude our study in Section 7.

Section snippets

Related work

Traditional annotation researches focus on learning the mapping between images and keywords [1], [3]. For example, Duygulu et al. [6] proposed a translation model to label image blobs. In [14], a cross-media relevance model is employed to predict the probability of generating a word, given the blobs in an image. In the scenario that each word is treated as a distinct class, image annotation can be viewed as multi-class classification problem. The basic intuition idea is to learn the most

System overview

The proposed MDS annotation process is depicted in Fig. 1. The whole process starts from an unlabeled database. Upon presenting the randomly selected database images, the user selects relevant images and annotates them. Then local clustering (Multiple Direction Search, described in the next section) is performed on the relevant images. If there are more than one clusters resulted, the initial query is decomposed into multiple sub-queries, each represents a cluster. And the sub-query points are

Multiple Directions Search (MDS)

In this section we describe the detail process of MDS: first, the user provides example images. Second, the system determines whether the images are diverse and should be process separately based on their visual features. If they are diverse, the dynamic directional clustering mechanism decomposes the user’s initial query into multiple sub-queries, which is to split the annotation direction into multiple paths. Third, combining the user’s previous behavior by exploiting the directional

Annotation propagation with MDS

In this section, we describe how the user specified annotations are propagated to images in the database. By utilizing the MDS technique, the user can navigate into multiple image neighborhood according to his/her interest. Along those navigation paths, the annotations of the example images are propagated.

Experimental study

In this section, we will discuss the experimental study on the proposed MDS annotation propagation system. We evaluated the system’s accuracy in terms of precision and recall; and the efficiency on propagating annotations.

Conclusion

In this paper, we proposed a Multi-Directional Search (MDS) technique. The MDS approach dynamically analyzes the user’s relevance feedback and considers multiple neighborhoods by decomposing initial query into separate sub-queries. As the sub-queries covers distinct image clusters scattered in the feature space, MDS addresses the inherent weakness of kNN based RF techniques – confining the search in one neighboring image cluster. Therefore, MDS can simultaneously annotate images with different

References (35)

  • K. Barnard, P. Duygulu, D. Forsyth, N. Freitas, D. Blei, M. Jordan, Matching words and pictures, Journal of Machine...
  • D. Blei, M.I. Jordan, Modeling annotated data, in: Proceedings of ACM SIGIR,...
  • G. Carneiro, N. Vasconcelos, A database centric view of semantic image annotation and retrieval, in: Proceedings of ACM...
  • Y. Cao, C. Wang, Z. Li, L. Zhang, L. Zhang, Spatial-bag-of-features, in: Proceedings of CVPR,...
  • L. Cao, J. Yu, J. Luo, T. Huang, Enhancing semantic and geographic annotation of web images via logistic canonical...
  • P. Duygulu, K. Barnard, N. Freitas, D. Forsyth, Object recognition as machine translation: learning a lexicon for a...
  • J. French, X.-Y. Jin, An empirical investigation of the scalability of a multiple viewpoint cbir system, in:...
  • T. Gevers et al.

    Content-Based Image Retrieval: An Overview

    (2004)
  • Greg Hamerly, Charles Elkan, Learning the k in k-means, in: Proceedings of NIPS,...
  • S.C.H. Hoi et al.

    A unified log-based relevance feedback scheme for image retrieval

    IEEE Transactions on Knowledge and Data Engineering

    (2006)
  • L. Hollink, G. Schreiber, J. Wielemaker, B. Wielinga, Semantic annotation of image collections, in: Proceedings of...
  • K.A. Hua, N. Yu, D.Z. Liu, Query decomposition: a multiple neighborhood approach to relevance feedback in content-based...
  • Y. Ishikawa, R. Subramanya, C. Faloutsos, Mindreader: querying databases through multiple examples, in: Proceedings of...
  • J. Jeon, R. Manmatha, Automatic image annotation of news images with large vocabularies and low quality training data,...
  • D.H. Kim, C.W. Chung, Qcluster: relevance feedback using adaptive clustering for content-based image retrieval, in:...
  • X. Li, L. Chen, L. Zhang, F. Lin, W.-Y. Ma, Image annotation by large-scale content-based image retrieval, in:...
  • B.T. Li, K. Goh, E. Chang, Confidence-based dynamic ensemble for image annotation and semantics discovery, in:...
  • Cited by (7)

    • Sports image detection based on particle swarm optimization algorithm

      2021, Microprocessors and Microsystems
      Citation Excerpt :

      Therefore, it is proposed that the PSVC set of rules improve the accuracy of marking automatic images. Initially, Ivekovic Luke [14] are applied to the top body posture PSO estimate from a multi-view video sequence. PSO set of rules applicable to the 20-dimensional search space.

    • Context-aware vocabulary tree for mobile landmark recognition

      2015, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Among these applications, mobile landmark recognition which uses the camera phone to capture a landmark and find out its related information, is receiving more and more users’ attention and becoming a hot research topic, such as mobile landmark identification [2–4] and recognition [4–10]. Several works that are related to mobile landmark recognition have appeared recently [1,2,4–7,9,10,26,29,32,33,35–37]. In these works, a common scheme is adopted which first extracts local features (e.g., SIFT [16,31]) from the images, then quantizes these features to visual words, and finally applies methods from text search for image retrieval or recognition [17,30].

    • A Review of Methods for the Image Automatic Annotation

      2021, Journal of Physics: Conference Series
    • An efficient mammogrammic image retrieval using ring-based classification

      2019, International Journal on Emerging Technologies
    View all citing articles on Scopus
    View full text