Journal of Visual Communication and Image Representation
A Multi-Directional Search technique for image annotation propagation
Highlights
► In this paper, we proposed a Multi-Directional Search (MDS) technique. ► MDS leverages user RF to annotate images of same semantic meanings but different visual characters. ► As the query can be hierarchically split into sub-queries, the annotation speed increases in new rounds. ► MDS also considers user’s previous query intention to better exploit the intent of user feedback. ► MDS can handle uncaptioned database with unlimited vocabularies effectively by using user’s input.
Introduction
Due to the popularity of digital cameras and stronger communication infrastructure, digital images become widely available around the world. In order to provide better searching and sharing services of this massive amount of images, detailed and accurate annotations of images are required. However, manually annotating images is a tedious and expensive process. Automatic or semi-automatic annotation techniques are highly desirable [1], [2], [3], [6], [14], [17], [19], [26], [27], [30], [35].
Recently, most of the annotation researches focus on learning the mapping between images and keywords [1], [3]. However, only a very limited number of concepts can be modeled on a small-scale image database by learning the projections or correlations between images and keywords. It is not applicable to the web-scale image database with potentially unlimited vocabularies.
Another image annotation approach is based on the image labels of an annotated database [16], [35]. The basic idea is to build a initial image database with labels, the labels are from the images’ bounding text on a web page or a vocabulary. Then the labels are propagated to the visually similar images. Those images are from a content-based image retrieval or clustering based on the images’ visual features. A potential weakness of these approaches is that the bounding text can be incomplete and/or inconsistent compared to the real image content, especially on the spam pages. From this point of view, the bounding text is not always reliable.
From the above discussion we can see that to annotate a large scale image database with unlimited vocabularies, user’s Relevance Feedback (RF) should be utilized as a trust-worthy source for annotations. At the same time, an accurate CBIR technique with user’s RF should be utilized for efficient annotation propagation. CBIR with RF has been an active area of research [8], [10], [12], [24], [29] for the last decades. Essentially most of the existing techniques are based on the kNN (k nearest neighbor) model. Two images are considered similar if they are near each other in the feature space based on some distance measure. For each round of RF, the user’s identified relevant images are used to retrieve k nearest images for the next round of user feedback. Different techniques have been proposed for the computation of these k nearest images.
However, the standard kNN model, which is employed by today’s relevance feedback techniques, has one inherent weakness. Due to the semantic gap, semantically similar images might have different appearances and are not necessarily located within close proximity to each other in the feature space. Since traditional CBIR techniques focus on searching for the single best-matching neighborhood, their performance suffers when the relevant images are in multiple neighborhoods far apart from each other in the feature space. For instance, the user provides visually different but semantically identical example images, then those images should be treated separately in the CBIR process.
To address the aforementioned limitations, we propose a Multi-Directional Search (MDS) technique. In this system, a user provides sample images with labels as query for annotation propagation: first, local clustering is performed on the example images based on their visual features. If the user’s RF images have very different visual features (shows his intension to annotate groups of images with diverse visual characteristics), the current query is decomposed into multiple sub-queries accordingly to better benefit from the feedback information and provide more precise annotation. Similar to CBIR, the neighboring images of the multiple sub-queries will be returned to the user for further RF, and the sub-queries might be decomposed again. This mechanism enables the proposed technique to explore all relevant neighborhoods in order to cover the k best matching images wherever they might be, rather than just the top k ranked images of the single best-matching cluster (while omitting other relevant clusters). The label of the example images will be propagated to the top-ranked images in the result set. This iterative process is repeated until the user stops.
The primary contributions of this paper are as follows:
- •
The MDS technique leverages user’s RF to annotate images with the same semantic meanings but different visual characters precisely and efficiently.
- •
As the user’s query can be hierarchically split into multiple sub-queries, the annotation speed increases with new rounds.
- •
With the user’s interaction, the annotations are propagated to the retrieved images near and on the query pathes. In this manner, the proposed technique can handle uncaptioned database with unlimited vocabularies effectively.
The remainder of this paper is organized as follows. The related works are discussed in Section 2. We give a system overview in Section 3. Then we explain the proposed Multiple Directions Search technique in Section 4. The annotation mechanism is describe in Section 5. The experimental results are discussed in Section 6. Finally, we conclude our study in Section 7.
Section snippets
Related work
Traditional annotation researches focus on learning the mapping between images and keywords [1], [3]. For example, Duygulu et al. [6] proposed a translation model to label image blobs. In [14], a cross-media relevance model is employed to predict the probability of generating a word, given the blobs in an image. In the scenario that each word is treated as a distinct class, image annotation can be viewed as multi-class classification problem. The basic intuition idea is to learn the most
System overview
The proposed MDS annotation process is depicted in Fig. 1. The whole process starts from an unlabeled database. Upon presenting the randomly selected database images, the user selects relevant images and annotates them. Then local clustering (Multiple Direction Search, described in the next section) is performed on the relevant images. If there are more than one clusters resulted, the initial query is decomposed into multiple sub-queries, each represents a cluster. And the sub-query points are
Multiple Directions Search (MDS)
In this section we describe the detail process of MDS: first, the user provides example images. Second, the system determines whether the images are diverse and should be process separately based on their visual features. If they are diverse, the dynamic directional clustering mechanism decomposes the user’s initial query into multiple sub-queries, which is to split the annotation direction into multiple paths. Third, combining the user’s previous behavior by exploiting the directional
Annotation propagation with MDS
In this section, we describe how the user specified annotations are propagated to images in the database. By utilizing the MDS technique, the user can navigate into multiple image neighborhood according to his/her interest. Along those navigation paths, the annotations of the example images are propagated.
Experimental study
In this section, we will discuss the experimental study on the proposed MDS annotation propagation system. We evaluated the system’s accuracy in terms of precision and recall; and the efficiency on propagating annotations.
Conclusion
In this paper, we proposed a Multi-Directional Search (MDS) technique. The MDS approach dynamically analyzes the user’s relevance feedback and considers multiple neighborhoods by decomposing initial query into separate sub-queries. As the sub-queries covers distinct image clusters scattered in the feature space, MDS addresses the inherent weakness of kNN based RF techniques – confining the search in one neighboring image cluster. Therefore, MDS can simultaneously annotate images with different
References (35)
- K. Barnard, P. Duygulu, D. Forsyth, N. Freitas, D. Blei, M. Jordan, Matching words and pictures, Journal of Machine...
- D. Blei, M.I. Jordan, Modeling annotated data, in: Proceedings of ACM SIGIR,...
- G. Carneiro, N. Vasconcelos, A database centric view of semantic image annotation and retrieval, in: Proceedings of ACM...
- Y. Cao, C. Wang, Z. Li, L. Zhang, L. Zhang, Spatial-bag-of-features, in: Proceedings of CVPR,...
- L. Cao, J. Yu, J. Luo, T. Huang, Enhancing semantic and geographic annotation of web images via logistic canonical...
- P. Duygulu, K. Barnard, N. Freitas, D. Forsyth, Object recognition as machine translation: learning a lexicon for a...
- J. French, X.-Y. Jin, An empirical investigation of the scalability of a multiple viewpoint cbir system, in:...
- et al.
Content-Based Image Retrieval: An Overview
(2004) - Greg Hamerly, Charles Elkan, Learning the k in k-means, in: Proceedings of NIPS,...
- et al.
A unified log-based relevance feedback scheme for image retrieval
IEEE Transactions on Knowledge and Data Engineering
(2006)
Cited by (7)
Sports image detection based on particle swarm optimization algorithm
2021, Microprocessors and MicrosystemsCitation Excerpt :Therefore, it is proposed that the PSVC set of rules improve the accuracy of marking automatic images. Initially, Ivekovic Luke [14] are applied to the top body posture PSO estimate from a multi-view video sequence. PSO set of rules applicable to the 20-dimensional search space.
Context-aware vocabulary tree for mobile landmark recognition
2015, Journal of Visual Communication and Image RepresentationCitation Excerpt :Among these applications, mobile landmark recognition which uses the camera phone to capture a landmark and find out its related information, is receiving more and more users’ attention and becoming a hot research topic, such as mobile landmark identification [2–4] and recognition [4–10]. Several works that are related to mobile landmark recognition have appeared recently [1,2,4–7,9,10,26,29,32,33,35–37]. In these works, a common scheme is adopted which first extracts local features (e.g., SIFT [16,31]) from the images, then quantizes these features to visual words, and finally applies methods from text search for image retrieval or recognition [17,30].
Segmentation of Wheat Lodging Areas from UAV Imagery Using an Ultra-Lightweight Network
2024, Agriculture (Switzerland)A Review of Methods for the Image Automatic Annotation
2021, Journal of Physics: Conference SeriesAn efficient mammogrammic image retrieval using ring-based classification
2019, International Journal on Emerging TechnologiesAutomatic Image Annotation Based on Particle Swarm Optimization and Support Vector Clustering
2017, Mathematical Problems in Engineering