Image distance metric learning based on neighborhood sets for automatic image annotation☆
Graphical abstract
Introduction
In the past decade, with the rapid development of Internet and the popularization of digital cameras and mobile phones, more and more digital images are hosted and shared on online image sites. Systems managing and analyzing images on image sites heavily depend on textual annotations of images. Many existing image search engines rely on associated text in the web page such as Yahoo!, Google and so on. Automatic image annotation (AIA) has become a challenging task for users quickly and efficiently to retrieval the interest image resources.
As an important part of image retrieval, the accuracy of annotated image semantic directly affects the performance of image retrieval system. In the past two decades, there have been three main image retrieval technologies [1]: text-based image retrieval (TBIR), content-based image retrieval (CBIR) [2] and semantic-based image retrieval (SBIR) [3]. For TBIR, the images are generally annotated manually, however, due to the cost of manual annotation is so high that it is difficult to realize image retrieval for large image databases. The manual annotation will not only limit the number of retrieved images, but also the efficiency of the retrieval system is low. For CBIR, it computes relevance only based on the similarity of low-level visual features such as colors, textures and shapes [4], [5]. In fact, people prefer retrieving images according to high-level semantic content. However, there is a gap between low-level visual features and high-level semantic contents, therefore the performances of many existing content-based image annotation algorithms were not so satisfactory [6], [7], [8].
Due to there exists difficulties in the TBIR and CBIR, SBIR has been focused. For SBIR, the images with semantic labels can then be retrieved accordance to the labels similarity using TBIR. AIA approach aims to automatically generate labels to describe the content of a given image. Currently, the most common AIA approaches have two types [9], [10], i.e., classification-based and probabilistic modeling-based approaches.
In first type, AIA can be viewed as a classification problem [11], which can be solved by a classifier. For annotating an image without caption, first, represent image into a low-level visual features vector. Then, classify the image into some categories. Finally, propagate the semantic of the corresponding category to given image. So, the unlabeled image may be automatically annotated.
In second type, probabilistic model [12] attempts to infer the joint probabilities between images and semantic concepts. Images given class can be regarded as instances of stochastic process that characterizes the class. Then, statistical models, such as Markov, Gaussian, and Bayes and so on, are trained and images are classified based on probability computation.
Although these two types are the most common annotation approaches, there are still some disadvantages. For example, classification-based annotation approach heavily relies on visual similarity for judging semantic similarity. In fact, it is well-known that semantic similarity does not equal to visual similarity [13], [14]. In addition, in many papers, the distance between the images is measured according to some traditional methods, e.g., Euclidean distance, Mahalanobis distance, Hamming distance, Cosine distance, Histogram distance and so on [15], [16]. Although these traditional distances are simple and convenient, it cannot accurately measure the similarity between two images in many cases.
In this paper, we propose a novel AIA approach, named NSIDML, which is characterized by learning image distance metric (IDM) based on existing knowledge of the samples and predicting all possible labels based on learned IDM using neighborhood set [17] (NS). In proposed NSIDML, NS is used to reduce the bias between visual similarity and semantic similarity. In proposed image distance metric learning (IDML) algorithm, all training samples are used for better utilizing existing resources and obtaining a more efficient AIA approach. The main contribution of this work is as follows:
- (1)
In this work, knowledge of sample set without caption, in the training set, is sufficiently considered, but not to limit knowledge of the training samples with caption, which ensures that the existing resources can be sufficiently utilized.
- (2)
For image without caption, the number of its labels is not predetermined. In other words, the number of labels is completely determined by image content.
- (3)
In the image annotation process, the proposed AIA approach is almost no human interaction. In other words, it can automatically implement and reduce the impact of human subjectivity.
The rest of this paper is organized as follows. Section 2 introduces related work. Section 3 reviews the preliminary knowledge including the neighborhood of image and block diagram of AIA approach. Section 4 describes the proposed IDML algorithm. Section 5 introduces the proposed AIA approach NSIDML based on IDML. Section 6 presents the experimental and comparison results. Finally, conclusions are given in Section 7.
Section snippets
Related work
In this section, we will introduce related work of image annotation approach; in particular, the classification-based and probabilistic model-based annotation approaches.
Preliminary
Set theory [28] is the branch of mathematical logic that studies sets, which is collections of objects. The language of set theory can be used in the definitions of nearly all objects. In this paper, NS is applied to solve difficult of AIA task.
For further discussion, some necessary notations and definitions are first introduced. Let Tr = {I1, I2, …, IN1} be the set of training images and Te = {I1, I2, …, IN2} the set of testing images without caption, N1 + N2 = n, and an image is represented as a M
Image distance metric learning
In the definition of the neighborhood, the image distance can be calculated by the distance function Δ, therefore Δ plays an important role for obtaining an appropriate neighborhood. How to more effectively measure the image distance has become a key problem in the field of image recognition.
In practical applications, a lot of the distance between the samples was measured according to some traditional distances [29], [30], [31], however these traditional distances are not always appropriate,
Automatic image annotation approach
In this paper, a novel AIA approach is proposed, called NSIDML. According to [44], [45], for I ∈ Te and a label l ∈ L, we let be a set of the images with label l in , and is the element number of image set . For each label l, is the probability of image I with label l and is the probability of image I without label l. is defined as, and is the number of
Experimental results
In this section, to evaluate the performance of proposed NSIDML approach, we present these experiment results.
Conclusions
In this paper, we investigated the applications of NS based on IDML for AIA task. The proposed AIA approach can improve performance of AIA approach. The main advantages of proposed AIA approach are as follows:
- (1)
In proposed IDML, knowledge of all samples with caption and without caption in the training set is sufficiently considered, but not to limit knowledge of the samples set with caption, which ensures that the existing resources can be sufficiently utilized. Moreover, in the training set, the
Acknowledgment
This work was supported by Natural Social Science Foundation of China (Grant No. 13BTQ050).
References (59)
- et al.
A review on automatic image annotation techniques
Pattern Recogn.
(2012) - et al.
Automatic image annotation using feature selection based on improving quantum particle swarm optimization
Signal Process.
(2015) - et al.
Content based image retrieval based on relative locations of multiple regions of interest using selective regions matching
Inf. Sci.
(2014) A new matching strategy for content based image retrieval system
Appl. Soft Comput.
(2014)Hybrid active learning for reducing the annotation effort of operators in classification systems
Pattern Recogn.
(2012)- et al.
Simultaneous image classification and annotation based on probabilistic model
J. China U Posts Telecommun.
(2012) - et al.
Large margin learning of hierarchical semantic similarity for image classification
Comput. Vis. Image Underst.
(2015) Interactive tool for image annotation using a semi-supervised and hierarchical approach
Comput. Stand. Interfaces
(2013)- et al.
Multi-class particle swarm model selection for automatic image annotation
Expert Syst. Appl.
(2012) - et al.
Support vector description of clusters for content-based image annotation
Pattern Recogn.
(2014)
Keybook: Unbias object recognition using keywords
Expert Syst. Appl.
Optimized normal and distance matching for heterogeneous object modeling
Comput. Ind. Eng.
Relational interpretations of neighborhood operators and rough set approximation operators
Inf. Sci.
Content-based multimedia information retrieval: state of the art and challenges
ACM Trans. Multimedia Comput. Commun. Appl.
A feature-word-topic model for image annotation and retrieval
ACM Trans. Web
Content-based image retrieval using moments of local ternary pattern
Mob. Networks Appl.
Image annotation techniques based on feature selection for class-pairs
Knowl. Inf. Syst.
Two-probabilistic latent semantic model for image annotation and retrieval
Lect. Notes Comput. Sci.
Ontology-based approach for measuring semantic similarity
Eng. Appl. Artif. Intell.
Selecting discrete and continuous features based on neighborhood decision error minimization
IEEE Trans. Syst. Man Cybern. B
Image annotation using SVM
An adaptive recognition model for image annotation
IEEE Trans. Syst. Man Cybern. C (Appl. Rev.)
Image annotation using multi-correlation probabilistic matrix factorization
Supervised learning of semantic classes for image annotation and retrieval
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (20)
Automatic image annotation based on an improved nearest neighbor technique with tag semantic extension model
2021, Procedia Computer ScienceDevelopment of a low-cost digital image processing system for oranges selection using hopfield networks
2021, Food and Bioproducts ProcessingCitation Excerpt :In the computer vision studies, images are processed and converted into input data for deterministic or stochastic models that deal with classification problems aiming to obtain similarity results with pre-established conditions. Normally, deterministic methods minimize the distance between similar points or maximize the distance between dissimilar points, which is calculated by the distance function and depends on the coordinates and geometric characteristics to be evaluated (Jin and Jin, 2016). However, distance methods ignore any statistical regularities to more effectively measure the image distance and that has become a problem in the field of image recognition (Weinberger and Saul, 2009).
A probabilistic topic model for event-based image classification and multi-label annotation
2019, Signal Processing: Image CommunicationCitation Excerpt :Most of these methods, however, are not efficient for annotation because of the semantic gap created by using low-level features for visual representation. Early approaches for MLIA use relevance feedback in image retrieval for labeling images [2,42–44]. However, this is a time-consuming process when dealing with large datasets and labels.
Histogram distance metric learning for facial expression recognition
2019, Journal of Visual Communication and Image RepresentationImage annotation: Then and now
2018, Image and Vision ComputingContent-based image retrieval model based on cost sensitive learning
2018, Journal of Visual Communication and Image RepresentationCitation Excerpt :DML uses the information provided by the labels and features of training samples to automatically learn from the image dataset and get the distance metric for satisfying the specific requirements. Many DML algorithms have been proposed [22–27] mainly including four categories [4]. The first category is supervised DML, which contains supervised global DML, local adaptive supervised DML, neighborhood component analysis (NCA) [26], relevant components analysis (RCA) [22] and so on.
- ☆
This paper has been recommended for acceptance by M.T. Sun.