Elsevier

Applied Soft Computing

Volume 13, Issue 2, February 2013, Pages 1292-1302
Applied Soft Computing

Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval

https://doi.org/10.1016/j.asoc.2012.10.019Get rights and content

Abstract

This paper proposes a semantic-based image retrieval approach which refers to the ability of using keywords for searching within image datasets. This is possible by adding some textual metadata, called image annotation. Combination of classification and regression in decision tree (DT) has been employed for multi-labeling image annotation in which, more than one label will be considered for every single tuple. In the proposed approach, all concepts and their corresponding ranks will be stored in each DT leaf node instead of storing only a concept or a rank. We have used a hierarchical network of semantics to achieve a better performance. The main idea behind our approach is that in each leaf node, the system should give a higher rank to concepts with highest degree of purity and details according to prepared hierarchical semantic network. A segmented, feature extracted and annotated image dataset, SAIAPR-TC12, has been used for evaluation. A hierarchy of 256 semantic concepts which have been used in annotation process, made it very suitable for testing the approach. Experimental results confirmed that our approach illustrates better performance in comparison with single-labeling approaches which only assign one class to every single tuple and only support linear relationship among concepts.

Highlights

► The evaluation method has been changed to 4-fold cross validation. ► F-Measure has been added as another measure for performance evaluation. ► Some grammar errors have been fixed.

Introduction

According to the rapid growth of amount of multimedia information like digital images, systems for organizing them to search and retrieval seem to be necessary. In last decade, image retrieval (IR) has attracted a great deal of researches to simplify making huge number of images organized [5]. There are three main generations of IR systems [1]. Text-based image retrieval systems were the first ones which act only based on text metadata provided by human. Some people assigned some tags to images and system retrieved images based on those labels. Newer systems employ the web mining as their metadata provider [11]. Today's image search engines like Google1 and Yahoo2 can be mentioned as some modern text-based image retrieval systems which act only according to the provided texts around images in web pages. In this approach, a large collection of images needed a much deal of time to be annotated and it was really exhausting for human. There was another problem called subjectivity of human annotation which meant different people may induct different things from an image.

These two problems made text-based systems deficient and then, a system for automatic processing and retrieval of images was sought. At the second generation, developing content-based image retrieval (CBIR) systems or visual information retrieval systems (VIRS) for automatic processing of features of image had been became prevalent [5], [6]. The classical paradigm for content-based image retrieval is query by visual example [24]. The main difference between these two systems is that human is the main part of the former [4]. These systems presented the most similar images in database to the query image provided by the user [18]. One of the biggest lacks in these systems was that they did not seek for concepts within an image and ranked images similarity only according to their visual contents like color, texture and shape [21]. For example, two images with different concepts like Sunset and Orange might be considered as similar, because they have similar color histogram. Another problem was that there should be always one query image [1].

These matters lead not to satisfy users and then semantic-based image retrieval (SBIR) systems appeared as a solution. Semantic-based image retrieval systems can detect concepts of images and enable users to look for the high-level semantics within images regardless to their low-level features. Development of image retrieval systems leads to emerging of Automatic Image Annotation (AIA) systems in which a machine has the role of human in text-based systems and provide textual metadata for images based on their low-level features [8], [12], [13], [14], [25]. By using AIA, a user can retrieve images and look for semantics of an image. The main goal of image annotation is to make searching images by a keyword feasible [2], [22]. The main idea of AIA is to automatically learn a model from large number of image samples [1]. In this paper we used decision tree (DT) to learn that model.

Decision tree as a tool with capability of selecting the most discriminatory features, comprehensibility by human, being able to deal with noisy and incomplete data, etc. has been very applicable in classification and data mining problems [15], [16], [26], [27]. The similarity between the way human use for interpreting images and the way decision tree uses for inducing concepts makes this tool very applicable in image classification and retrieval [9].

Among different DT construction algorithms, ID3, C4.5 and CART can be mentioned as the most famous ones [1]. These algorithms are different from each other from three aspects: (1) the feature type they support (continuous vs. discrete), (2) feature selection criteria and (3) final node insertion process.

ID3 is known as the first and simplest DT construction algorithm. Although this algorithm has some common features with other ones, like selecting the most discriminatory features, but it has some special characteristics. Simplicity and comprehensibility can be considered as its advantages. On the other hand, it only supports discrete values as input and no matter how the discretization algorithm is efficient, ID3 has to work with unreal data when continuous values, like features of an image, are presented. The C4.5 [19] was developed to address this problem but it can be used just for data classification. CART [20] is another famous DT construction algorithm that creates a binary tree which can be used in regression problems.

Regardless of the advantages and disadvantages of these algorithms separately, all of them can be considered as a generator of a tree with only one class (or one value for regression trees) in leaf nodes. In some situations, for example working with True or False classification problems, assigning only one class to each tuple sounds good but for some problems it does not. This matter would be more challengeable in SBIR because an image could cover more than one concept simultaneously. For example, assigning just one class among Truck, Road and Sunset to Fig. 1 would be unfair and single-labeling, assigning only one class to each feature vector will not be satisfying. However it can be handled by Region-Based Image Retrieval (RBIR) [7], but we cannot still determine how much a region belongs to a particular class. So, we can only say crisply if an image or a region covers a concept or does not. Furthermore, segmentation is really hard to do.

Chen et al. [12] proposed an approach to construct a DT enable to deal with hierarchical class labels. They named that decision tree as Hierarchical Decision Tree (HDT) and achieved a higher accuracy rate for data classification. In their approach, the final concepts have been arranged hierarchically and by using a new measure for selecting features and splitting the DT, they made a more accurate DT. More descriptions will be brought in Section 2.

Although they used a hierarchical organization of semantic concepts and achieved a better performance in comparison with linear (flat) organization, but their work is still a single-labeling approach.

In this paper, we have tried a combination of classification and regression in DT construction process based on HDT to achieve multi-labeling image annotation. Multi-labeling image annotation provides more than one class label for every single tuple, corresponding to an image or a region. In our approach, instead of choosing only one concept in DT leaves, we will consider all concepts and their ranks. What we do is classification because we have discrete semantic classes in final nodes and it is also regression because we assign some continuous values to these discrete classes as their rank. The rank of a concept is impressed by amount of details according to its location in the hierarchical network of semantic concepts and the deal of purity it supplies. So, all tuples linked to all concepts by their corresponding ranks and the system can determine how much an image (or a region) covers a concept.

The rest of this paper is organized as follows. Section 2 will bring a description about building a DT from data with hierarchical structures. In Section 3, the proposed system and its components will be discussed. In Section 4 we will compare the results of the multi-labeling and single-labeling approaches, and finally in Section 5, we briefly conclude this paper.

Section snippets

DT building process

To achieve multi-labeling annotation, an algorithm enable to determine more than only one class for an image (or a region) is needed. This can be considered from two points of view. Whether labels are totally separated or there are a relationship among them. In the second state, when they are not completely irrelevant, a hierarchical structure can be used to represent them and their relationships.

System description

The overall diagram of the proposed system can be found in Fig. 5. The system can be divided into two parts. The first part is responsible to build a DT and comprises of data normalization, data discretization, preparing the ontology and building DT according to the proposed approach, respectively. Calculating the rank of each tuple for concepts, creating indices for each concept and bringing result according the query (a semantic concept) constitute the second part which is the retrieval

Performance evaluation

The system will work offline so that the annotation and retrieval processes are separated completely. In the first step, some images are used as training set and after mining the DT, the annotation process is done and semantic ranks for all images are stored in database. When a user looks for a particular concept, he has to just select a concept within a list. Then, system will bring images with higher rank for the selected concept.

One of the biggest challenges in image retrieval systems,

Conclusion

In this paper, multi-labeling image annotation was achieved by combination of classification and regression in DT. In doing so, instead of selecting just one class with highest rank, all concepts and their ranks were considered in final leaf nodes. Hierarchical network was employed to represent relationships of semantic classes. The proposed system covered the weakness of single-labeling approach in establishing trade off between accuracy and details. Multi-labeling approach considers all

References (27)

  • H. Müller et al.

    Performance evaluation in content-based image retrieval: overview and proposals

    Pattern Recognition Letters

    (2001)
  • Z. Hong et al.

    Query expansion by text and image features in image retrieval

    Journal of Visual Communication and Image Representation

    (1998)
  • Y. Liua et al.

    A survey of content-based image retrieval with high-level semantics

    Pattern Recognition

    (2007)
  • Cited by (35)

    • GPU-based acceleration of evolutionary induction of model trees

      2022, Applied Soft Computing
      Citation Excerpt :

      The gray level of each region represents a different class label (for a classification tree), while the height corresponds to the value of the prediction function (regression and model trees). Although regression trees are not as popular as classification trees, they are highly competitive with other machine learning algorithms [21] and are often applied to real-life problems [22,23]. Inducing an optimal DT is known as NP-complete [24].

    • Multi-label feature selection based on label distribution and feature complementarity

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      Besides, it may accord with the practical situation compared to the classical supervised single-label classification problem where every instance has only one relevant candidate label. Furthermore, it exists in numerous practical applications, such as text classification [7–10], image recognition [11–14], and gene function annotation [15–17]. For instance, one news item can be related to any of the following three categories: academics, politics, and society; one image may be related to marine, sunrises, and ships; one gene could be related to several functional groups, such as transcription, cellular biogenesis, and protein synthesis.

    • Modelling of soil permeability using different data driven algorithms based on physical properties of soil

      2020, Journal of Hydrology
      Citation Excerpt :

      Leaf hub is a representation to an arrangement. The highest decision hub in a tree, which relates to the finest forecaster is called root hub (Lee et al., 2013; Fakhari and Moghadam, 2013; Nasridinov et al., 2013). The details of parameters used in decision tree algorithm is presented in Table 2.

    View all citing articles on Scopus
    View full text