Elsevier

Pattern Recognition

Volume 40, Issue 1, January 2007, Pages 19-32
Pattern Recognition

Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error

https://doi.org/10.1016/j.patcog.2006.07.002Get rights and content

Abstract

Image classification arises as an important phase in the overall process of automatic image annotation and image retrieval. In this study, we are concerned with the design of image classifiers developed in the feature space formed by low level primitives defined in the setting of the MPEG-7 standard. Our objective is to investigate the discriminatory properties of such standard image descriptors and look at efficient architectures of the classifiers along with their design pursuits. The generalization capabilities of an image classifier are essential to its successful usage in image retrieval and annotation. Intuitively, it is expected that the classifier should achieve high classification accuracy on unseen images that are quite “similar” to those occurring in the training set. On the other hand, we may assume that the performance of the classifier could not be guaranteed in the case of images that are very much dissimilar from the elements of the training set. To follow this observation, we develop and use a concept of the localized generalization error and show how it guides the design of the classifier. As image classifier, we consider the usage of the radial basis function neural networks (RBFNNs). Through intensive experimentation we show that the resulting classifier outperforms other classifiers such as a multi-class support vector machines (SVMs) as well as “standard” RBFNNs (viz. those developed without the guidance offered by the optimization of the localized generalization error). The experimental studies reveal some interesting interpretation abilities of the RBFNN classifiers being related with their receptive fields.

Introduction

A vast amount of digital images are become omnipresent these days call for an intensified effort towards building efficient needs for their automatic annotation and retrieval mechanisms. Classification of digital images becomes one of the fundamental activities one could view as the fundamental prerequisite for all other image processing pursuits. In image classification we could follow the general paradigm of pattern recognition. In pattern recognition, each object is described by a collection of features that forms a multidimensional space in which all discrimination activities take place. Various classifiers, both linear and nonlinear, become available at this stage including support vector machines (SVM), linear classifiers, polynomial classifiers, radial basis function neural networks (RBFNNs), fuzzy rule-based systems, etc. No matter what classifier has been chosen, a formation of a suitable feature becomes of paramount relevance. The problem of forming of the feature space in the case of images is even more complicated. On one hand, we have a lot of different alternatives. On the other hand, the diversity of images contributes to the elevated level of complexity and difficulty. In images, we encounter a variety of images showing different shapes, colors, texture, etc yet belonging to the same class. An image could be described by an intensity of color of each pixel or even better by some descriptors. In this study, our objective is to explore and quantify the discriminatory properties of the MPEG-7 image descriptors in classification problems. Those are explored in conjunction to two main categories of classifiers such as SVMs and RBFNNs.

Unfortunately, it becomes obvious that any classifier requiring high training accuracy may not achieve good generalization capability. Since both target outputs and distributions of the unseen samples are unknown, it is impossible to compute the generalization error in a direct way. There are two major approaches to estimate the generalization error, namely, analytical model and cross-validation (CV). In general, analytical models bound above the generalization error for any unseen samples and do not distinguish trained classifiers with the same number of effective parameters but different values of parameters. Thus, the error bounds given by those models are usually loose [1]. The major problem of analytical models is the estimation of the number of effective parameters of the classifier, which could be solved by using the VC-dimensions [2]. The VC-dimension of a classifier is defined as the largest number of samples that can be shattered by this classifier [2]. However, only loose bound of VC-dimensions could be found for nonlinear classifiers, e.g. neural networks, and this puts a severe limitation on the applicability of analytical models to nonlinear classifiers, except the SVM [3]. Although CV uses true target outputs for unseen samples, it is time consuming for large datasets and CL classifiers must be trained for C-fold CV and L choices of classifier parameters. CV methods estimate the expected generalization error instead of its bound, thus they do not guarantee the finally built classifier to have good generalization capability [1].

In image classification, one may not expect a classifier trained using one category of images (say, animals) to correctly classify images coming from some other categories (e.g. vegetables). In this case, one may revise the training dataset by adding training samples of vegetables and re-train the classifier to include the new class of images. For example, in our dataset we have images of cow but not airplane, thus we could not expect the classifier trained using our dataset to correctly recognize an airplane. It is expected that an image classifier work well for those classes that have been used to train it assuming images belonging to the same class are conceptually similar and such that their descriptor values should also be similar. That is, unseen samples similar to the training samples, in terms of sup-type of distance in the feature space is smaller than a given threshold, are considered to be more important. Thus, in the evaluation of the generalization capabilities of the image classifiers, one may ignore those images that are totally dissimilar to those existing in the training set.

In general, image classification problems are multi-class classification problems and difficult to find a classifier with good generalization properties. In this work, we aim to find an image classifier featuring better generalization capability and interpretability with respect to domain knowledge in image classification. We concentrate on finding an optimal number of receptive fields for RBFNNs to classify the images with lower generalization error to unseen images.

We organize the study in the following manner. The starting point is a discussion on the formation of the feature space based upon the framework of descriptors being available in the MPEG-7 standard. These issues are covered in Section 2. We provide a brief introduction to image classifiers in Section 3. The localized generalization error model (RSM*) and the corresponding approach to the selection of the architecture of the network are described in Sections 4 and 5, respectively. We present a comprehensive suite of experimental studies in Section 5. Concluding comments are covered in Section 6.

Section snippets

MPEG-7 feature space

In this section, we elaborate on the feature space arising within the framework of MPEG-7. The MPEG-7 descriptors are useful for low-level matching and provide a great flexibility for a wide range of applications.

Classifiers for image classification

In this section, we discuss several selected architectures of classifiers that are quite often encountered in image classification. It is of interest to investigate their properties in this setting and review some related development strategies.

A concept and realization of the localized generalization error

Given the anticipated diversity of images to be classified, one could easily envision that there is no classification algorithm that is capable of carrying out a zero error classification. This straightforward and very much intuitive observation is that when it comes to images that are very much different from those the classifier was exposed during the training phase. In other words, we acknowledge that any classifier comes with some limited generalization capabilities. In terms of the

The architecture design of RBFNNs

In the sequel, we confine ourselves to RBFNNs with Gaussian receptive fields. We apply a standard clustering algorithm (say, k-Means, self-organizing maps, etc.) to find the location of the receptive fields of the network. Typically, this is done once the number of the receptive fields has been fixed. The choice of this number is not a trivial task and its suitable selection impacts the generalization abilities of the network. To address this issue, we discuss a new algorithm which will lead to

Experiments

In this section, we elaborate on the series of experiments. First, we elaborate on the experimental setup. Next, we report on the experimental results and focus on the interpretation of the network.

Conclusions

In this study, being motivated by the concept that semantically similar images should exhibit similarity in the feature space, we proposed an application of the localized generalization error model to image classification. This model captures the generalization error for unseen samples that are similar to the training samples. Experimental results show that the RBFNN trained using the minimization of the localized generalization error outperforms “standard” RBFNN and multi-class SVM. Moreover,

Acknowledgments

This work is supported by a Hong Kong Polytechnic University Interfaculty Research Grant No. G-T891 and Canada Research Chair (W. Pedrycz).

References (24)

  • T. Hastie et al.

    The Element of Statistical Learning

    (2001)
  • V. Vapnik

    Statistical Learning Theory

    (1998)
  • V. Cherkassky et al.

    Model complexity control for regression using VC generalization bounds

    IEEE Trans. Neural Networks

    (1999)
  • B.S. Manjunath et al.

    Introduction to MPEG-7 Multimedia Content Description Interface

    (2002)
  • E. Izquierdo, I. Damnjanovic, P. Villegas, X. Li-Qun, S. Herrmann, Bringing user satisfaction to media access: the 1st...
  • V. Mezaris, H. Doulaverakis, R.M.B. de Otalora, S. Herrmann, I. Kompatsiaris, M.G. Strintzis, A test-bed for...
  • S.-F. Chang et al.

    Overview of the MPEG-7 standard

    IEEE Trans. Circuits Systems Video Technol.

    (2001)
  • A. Dorado, W. Pedrycz, E. Izquierdo, An MPEG-7 learning space for semantic image classification, Proceedings of the...
  • B.S. Manjunath et al.

    Color and texture descriptors

    IEEE Trans. Circuits Systems Video Technol.

    (2001)
  • A. Barla, F. Odone, A. Verri, Old fashioned state-of-the-art image classification, IEEE Proceedings of International...
  • A.K. Jain et al.

    Statistical pattern recognition: a review

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • O. Chapelle et al.

    Support vector machines for histogram-based image classification

    IEEE Trans. Neural Networks

    (1999)
  • Cited by (97)

    • Maximizing minority accuracy for imbalanced pattern classification problems using cost-sensitive Localized Generalization Error Model

      2021, Applied Soft Computing
      Citation Excerpt :

      This is a challenging task if both the data distribution and the costs may change over time. For instance, image classification problems [47] are usually a multi-class imbalanced problems but most of current methods ignore the imbalance issue in different classes. The application of the c-LGEM to multi-class image classification problem may focus on the very large number of classes issue which leads to a very imbalanced classification problem for each class.

    • Design methodology for Radial Basis Function Neural Networks classifier based on locally linear reconstruction and Conditional Fuzzy C-Means clustering

      2019, International Journal of Approximate Reasoning
      Citation Excerpt :

      Fuzzy radial basis function neural networks (FRBFNNs) form fuzzy neural networks. FRBFNNs have been used widely in various areas such as system modeling, control, and classification [2,5–13]. FRBFNNs are another type of hybrid system, which stems from the fuzzy inference system and neural networks.

    • Modeling of CO<inf>2</inf> solubility in MEA, DEA, TEA, and MDEA aqueous solutions using AdaBoost-Decision Tree and Artificial Neural Network

      2017, International Journal of Greenhouse Gas Control
      Citation Excerpt :

      SVM is categorized as a supervised method of machine learning. In cases associated with estimation of function, regression analysis, and classification, SVMs are attractive approaches (Jeng, 2006; Wing et al., 2007; Tsai and Sun, 2007; Acevedo-Rodríguez et al., 2009; Ceperic et al., 2012; Huang et al., 2004; Li et al., 2009; Stoean and Stoean, 2013; Subasi and Ismail Gursoy, 2010). Meyer et al. (2003) have compared the SVM to 16 classifiers and 9 regression approaches.

    View all citing articles on Scopus
    View full text