Image distance metric learning based on neighborhood sets for automatic image annotation

https://doi.org/10.1016/j.jvcir.2015.10.017Get rights and content

Highlights

  • Knowledge of the samples with and without caption is sufficiently considered.

  • The number of labels is completely determined by the image content.

  • The proposed AIA approach can automatically implemented.

Abstract

Since there is semantic gap between low-level visual features and high-level image semantic, the performance of many existing content-based image annotation algorithms is not satisfactory. In order to bridge the gap and improve the image annotation performance, a novel automatic image annotation (AIA) approach using neighborhood set (NS) based on image distance metric learning (IDML) algorithm is proposed in this paper. According to IDML, we can easily obtain the neighborhood set of each image since obtained image distance can effectively measure the distance between images for AIA task. By introducing NS, the proposed AIA approach can predict all possible labels of the image without caption. The experimental results confirm that the introduction of NS based on IDML can improve the efficiency of AIA approaches and achieve better annotation performance than the existing AIA approaches.

Introduction

In the past decade, with the rapid development of Internet and the popularization of digital cameras and mobile phones, more and more digital images are hosted and shared on online image sites. Systems managing and analyzing images on image sites heavily depend on textual annotations of images. Many existing image search engines rely on associated text in the web page such as Yahoo!, Google and so on. Automatic image annotation (AIA) has become a challenging task for users quickly and efficiently to retrieval the interest image resources.

As an important part of image retrieval, the accuracy of annotated image semantic directly affects the performance of image retrieval system. In the past two decades, there have been three main image retrieval technologies [1]: text-based image retrieval (TBIR), content-based image retrieval (CBIR) [2] and semantic-based image retrieval (SBIR) [3]. For TBIR, the images are generally annotated manually, however, due to the cost of manual annotation is so high that it is difficult to realize image retrieval for large image databases. The manual annotation will not only limit the number of retrieved images, but also the efficiency of the retrieval system is low. For CBIR, it computes relevance only based on the similarity of low-level visual features such as colors, textures and shapes [4], [5]. In fact, people prefer retrieving images according to high-level semantic content. However, there is a gap between low-level visual features and high-level semantic contents, therefore the performances of many existing content-based image annotation algorithms were not so satisfactory [6], [7], [8].

Due to there exists difficulties in the TBIR and CBIR, SBIR has been focused. For SBIR, the images with semantic labels can then be retrieved accordance to the labels similarity using TBIR. AIA approach aims to automatically generate labels to describe the content of a given image. Currently, the most common AIA approaches have two types [9], [10], i.e., classification-based and probabilistic modeling-based approaches.

In first type, AIA can be viewed as a classification problem [11], which can be solved by a classifier. For annotating an image without caption, first, represent image into a low-level visual features vector. Then, classify the image into some categories. Finally, propagate the semantic of the corresponding category to given image. So, the unlabeled image may be automatically annotated.

In second type, probabilistic model [12] attempts to infer the joint probabilities between images and semantic concepts. Images given class can be regarded as instances of stochastic process that characterizes the class. Then, statistical models, such as Markov, Gaussian, and Bayes and so on, are trained and images are classified based on probability computation.

Although these two types are the most common annotation approaches, there are still some disadvantages. For example, classification-based annotation approach heavily relies on visual similarity for judging semantic similarity. In fact, it is well-known that semantic similarity does not equal to visual similarity [13], [14]. In addition, in many papers, the distance between the images is measured according to some traditional methods, e.g., Euclidean distance, Mahalanobis distance, Hamming distance, Cosine distance, Histogram distance and so on [15], [16]. Although these traditional distances are simple and convenient, it cannot accurately measure the similarity between two images in many cases.

In this paper, we propose a novel AIA approach, named NSIDML, which is characterized by learning image distance metric (IDM) based on existing knowledge of the samples and predicting all possible labels based on learned IDM using neighborhood set [17] (NS). In proposed NSIDML, NS is used to reduce the bias between visual similarity and semantic similarity. In proposed image distance metric learning (IDML) algorithm, all training samples are used for better utilizing existing resources and obtaining a more efficient AIA approach. The main contribution of this work is as follows:

  • (1)

    In this work, knowledge of sample set without caption, in the training set, is sufficiently considered, but not to limit knowledge of the training samples with caption, which ensures that the existing resources can be sufficiently utilized.

  • (2)

    For image without caption, the number of its labels is not predetermined. In other words, the number of labels is completely determined by image content.

  • (3)

    In the image annotation process, the proposed AIA approach is almost no human interaction. In other words, it can automatically implement and reduce the impact of human subjectivity.

The rest of this paper is organized as follows. Section 2 introduces related work. Section 3 reviews the preliminary knowledge including the neighborhood of image and block diagram of AIA approach. Section 4 describes the proposed IDML algorithm. Section 5 introduces the proposed AIA approach NSIDML based on IDML. Section 6 presents the experimental and comparison results. Finally, conclusions are given in Section 7.

Section snippets

Related work

In this section, we will introduce related work of image annotation approach; in particular, the classification-based and probabilistic model-based annotation approaches.

Preliminary

Set theory [28] is the branch of mathematical logic that studies sets, which is collections of objects. The language of set theory can be used in the definitions of nearly all objects. In this paper, NS is applied to solve difficult of AIA task.

For further discussion, some necessary notations and definitions are first introduced. Let Tr = {I1, I2, …, IN1} be the set of training images and Te = {I1, I2, …, IN2} the set of testing images without caption, N1 + N2 = n, and an image is represented as a M

Image distance metric learning

In the definition of the neighborhood, the image distance can be calculated by the distance function Δ, therefore Δ plays an important role for obtaining an appropriate neighborhood. How to more effectively measure the image distance has become a key problem in the field of image recognition.

In practical applications, a lot of the distance between the samples was measured according to some traditional distances [29], [30], [31], however these traditional distances are not always appropriate,

Automatic image annotation approach

In this paper, a novel AIA approach is proposed, called NSIDML. According to [44], [45], for I  Te and a label l  L, we let δl(I) be a set of the images with label l in δ(I), and |δl(I)| is the element number of image set δl(I). For each label l, P(l+|δl(I)) is the probability of image I with label l and P(l-|δl(I)) is the probability of image I without label l. P(l+|δl(I)) is defined asP(l+|δl(I))=1,if|δl(I)|=|δ(I)|0,if|δl(I)|=0ω,otherwiseP(l-|δl(I))=1-P(l+|δl(I)), and |δ(I)| is the number of

Experimental results

In this section, to evaluate the performance of proposed NSIDML approach, we present these experiment results.

Conclusions

In this paper, we investigated the applications of NS based on IDML for AIA task. The proposed AIA approach can improve performance of AIA approach. The main advantages of proposed AIA approach are as follows:

  • (1)

    In proposed IDML, knowledge of all samples with caption and without caption in the training set is sufficiently considered, but not to limit knowledge of the samples set with caption, which ensures that the existing resources can be sufficiently utilized. Moreover, in the training set, the

Acknowledgment

This work was supported by Natural Social Science Foundation of China (Grant No. 13BTQ050).

References (59)

  • W.L. Hoo et al.

    Keybook: Unbias object recognition using keywords

    Expert Syst. Appl.

    (2015)
  • K. Samanta et al.

    Optimized normal and distance matching for heterogeneous object modeling

    Comput. Ind. Eng.

    (2014)
  • Y.Y. Yao

    Relational interpretations of neighborhood operators and rough set approximation operators

    Inf. Sci.

    (1998)
  • M.S. Lew et al.

    Content-based multimedia information retrieval: state of the art and challenges

    ACM Trans. Multimedia Comput. Commun. Appl.

    (2006)
  • C. Wang, L. Zhang, H.J. Zhang, Learning to reduce the semantic gap in web image retrieval and annotation, in: The 31st...
  • C.T. Nguyen et al.

    A feature-word-topic model for image annotation and retrieval

    ACM Trans. Web

    (2013)
  • P. Srivastava et al.

    Content-based image retrieval using moments of local ternary pattern

    Mob. Networks Appl.

    (2014)
  • J.J. Lu et al.

    Image annotation techniques based on feature selection for class-pairs

    Knowl. Inf. Syst.

    (2010)
  • N. Watcharapinchai et al.

    Two-probabilistic latent semantic model for image annotation and retrieval

    Lect. Notes Comput. Sci.

    (2011)
  • M.A.H. Taieb et al.

    Ontology-based approach for measuring semantic similarity

    Eng. Appl. Artif. Intell.

    (2014)
  • Q.H. Hu et al.

    Selecting discrete and continuous features based on neighborhood decision error minimization

    IEEE Trans. Syst. Man Cybern. B

    (2010)
  • C. Cusano et al.

    Image annotation using SVM

  • L.J. Li, R. Socher, F.F. Li, Towards total scene understanding: classification, annotation and segmentation in an...
  • Z.H. Chen et al.

    An adaptive recognition model for image annotation

    IEEE Trans. Syst. Man Cybern. C (Appl. Rev.)

    (2012)
  • M. Guillaumin, T. Mensink, J. Verbeek, C. Schmid, Tagprop: Discriminative metric learning in nearest neighbor models...
  • Y. Mori, H. Takahashi, R. Oka, Image-to-word transformation based on dividing and vector quantizing images with words,...
  • C. Wang, D. Blei, F.F. Li, Simultaneous image classification and annotation, in: IEEE Computer Society Conference on...
  • Z.C. Li et al.

    Image annotation using multi-correlation probabilistic matrix factorization

  • G. Carneiro et al.

    Supervised learning of semantic classes for image annotation and retrieval

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • Cited by (20)

    • Development of a low-cost digital image processing system for oranges selection using hopfield networks

      2021, Food and Bioproducts Processing
      Citation Excerpt :

      In the computer vision studies, images are processed and converted into input data for deterministic or stochastic models that deal with classification problems aiming to obtain similarity results with pre-established conditions. Normally, deterministic methods minimize the distance between similar points or maximize the distance between dissimilar points, which is calculated by the distance function and depends on the coordinates and geometric characteristics to be evaluated (Jin and Jin, 2016). However, distance methods ignore any statistical regularities to more effectively measure the image distance and that has become a problem in the field of image recognition (Weinberger and Saul, 2009).

    • A probabilistic topic model for event-based image classification and multi-label annotation

      2019, Signal Processing: Image Communication
      Citation Excerpt :

      Most of these methods, however, are not efficient for annotation because of the semantic gap created by using low-level features for visual representation. Early approaches for MLIA use relevance feedback in image retrieval for labeling images [2,42–44]. However, this is a time-consuming process when dealing with large datasets and labels.

    • Histogram distance metric learning for facial expression recognition

      2019, Journal of Visual Communication and Image Representation
    • Image annotation: Then and now

      2018, Image and Vision Computing
    • Content-based image retrieval model based on cost sensitive learning

      2018, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      DML uses the information provided by the labels and features of training samples to automatically learn from the image dataset and get the distance metric for satisfying the specific requirements. Many DML algorithms have been proposed [22–27] mainly including four categories [4]. The first category is supervised DML, which contains supervised global DML, local adaptive supervised DML, neighborhood component analysis (NCA) [26], relevant components analysis (RCA) [22] and so on.

    View all citing articles on Scopus

    This paper has been recommended for acceptance by M.T. Sun.

    View full text