Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error
Introduction
A vast amount of digital images are become omnipresent these days call for an intensified effort towards building efficient needs for their automatic annotation and retrieval mechanisms. Classification of digital images becomes one of the fundamental activities one could view as the fundamental prerequisite for all other image processing pursuits. In image classification we could follow the general paradigm of pattern recognition. In pattern recognition, each object is described by a collection of features that forms a multidimensional space in which all discrimination activities take place. Various classifiers, both linear and nonlinear, become available at this stage including support vector machines (SVM), linear classifiers, polynomial classifiers, radial basis function neural networks (RBFNNs), fuzzy rule-based systems, etc. No matter what classifier has been chosen, a formation of a suitable feature becomes of paramount relevance. The problem of forming of the feature space in the case of images is even more complicated. On one hand, we have a lot of different alternatives. On the other hand, the diversity of images contributes to the elevated level of complexity and difficulty. In images, we encounter a variety of images showing different shapes, colors, texture, etc yet belonging to the same class. An image could be described by an intensity of color of each pixel or even better by some descriptors. In this study, our objective is to explore and quantify the discriminatory properties of the MPEG-7 image descriptors in classification problems. Those are explored in conjunction to two main categories of classifiers such as SVMs and RBFNNs.
Unfortunately, it becomes obvious that any classifier requiring high training accuracy may not achieve good generalization capability. Since both target outputs and distributions of the unseen samples are unknown, it is impossible to compute the generalization error in a direct way. There are two major approaches to estimate the generalization error, namely, analytical model and cross-validation (CV). In general, analytical models bound above the generalization error for any unseen samples and do not distinguish trained classifiers with the same number of effective parameters but different values of parameters. Thus, the error bounds given by those models are usually loose [1]. The major problem of analytical models is the estimation of the number of effective parameters of the classifier, which could be solved by using the VC-dimensions [2]. The VC-dimension of a classifier is defined as the largest number of samples that can be shattered by this classifier [2]. However, only loose bound of VC-dimensions could be found for nonlinear classifiers, e.g. neural networks, and this puts a severe limitation on the applicability of analytical models to nonlinear classifiers, except the SVM [3]. Although CV uses true target outputs for unseen samples, it is time consuming for large datasets and CL classifiers must be trained for C-fold CV and L choices of classifier parameters. CV methods estimate the expected generalization error instead of its bound, thus they do not guarantee the finally built classifier to have good generalization capability [1].
In image classification, one may not expect a classifier trained using one category of images (say, animals) to correctly classify images coming from some other categories (e.g. vegetables). In this case, one may revise the training dataset by adding training samples of vegetables and re-train the classifier to include the new class of images. For example, in our dataset we have images of cow but not airplane, thus we could not expect the classifier trained using our dataset to correctly recognize an airplane. It is expected that an image classifier work well for those classes that have been used to train it assuming images belonging to the same class are conceptually similar and such that their descriptor values should also be similar. That is, unseen samples similar to the training samples, in terms of sup-type of distance in the feature space is smaller than a given threshold, are considered to be more important. Thus, in the evaluation of the generalization capabilities of the image classifiers, one may ignore those images that are totally dissimilar to those existing in the training set.
In general, image classification problems are multi-class classification problems and difficult to find a classifier with good generalization properties. In this work, we aim to find an image classifier featuring better generalization capability and interpretability with respect to domain knowledge in image classification. We concentrate on finding an optimal number of receptive fields for RBFNNs to classify the images with lower generalization error to unseen images.
We organize the study in the following manner. The starting point is a discussion on the formation of the feature space based upon the framework of descriptors being available in the MPEG-7 standard. These issues are covered in Section 2. We provide a brief introduction to image classifiers in Section 3. The localized generalization error model and the corresponding approach to the selection of the architecture of the network are described in Sections 4 and 5, respectively. We present a comprehensive suite of experimental studies in Section 5. Concluding comments are covered in Section 6.
Section snippets
MPEG-7 feature space
In this section, we elaborate on the feature space arising within the framework of MPEG-7. The MPEG-7 descriptors are useful for low-level matching and provide a great flexibility for a wide range of applications.
Classifiers for image classification
In this section, we discuss several selected architectures of classifiers that are quite often encountered in image classification. It is of interest to investigate their properties in this setting and review some related development strategies.
A concept and realization of the localized generalization error
Given the anticipated diversity of images to be classified, one could easily envision that there is no classification algorithm that is capable of carrying out a zero error classification. This straightforward and very much intuitive observation is that when it comes to images that are very much different from those the classifier was exposed during the training phase. In other words, we acknowledge that any classifier comes with some limited generalization capabilities. In terms of the
The architecture design of RBFNNs
In the sequel, we confine ourselves to RBFNNs with Gaussian receptive fields. We apply a standard clustering algorithm (say, -Means, self-organizing maps, etc.) to find the location of the receptive fields of the network. Typically, this is done once the number of the receptive fields has been fixed. The choice of this number is not a trivial task and its suitable selection impacts the generalization abilities of the network. To address this issue, we discuss a new algorithm which will lead to
Experiments
In this section, we elaborate on the series of experiments. First, we elaborate on the experimental setup. Next, we report on the experimental results and focus on the interpretation of the network.
Conclusions
In this study, being motivated by the concept that semantically similar images should exhibit similarity in the feature space, we proposed an application of the localized generalization error model to image classification. This model captures the generalization error for unseen samples that are similar to the training samples. Experimental results show that the RBFNN trained using the minimization of the localized generalization error outperforms “standard” RBFNN and multi-class SVM. Moreover,
Acknowledgments
This work is supported by a Hong Kong Polytechnic University Interfaculty Research Grant No. G-T891 and Canada Research Chair (W. Pedrycz).
References (24)
- et al.
The Element of Statistical Learning
(2001) Statistical Learning Theory
(1998)- et al.
Model complexity control for regression using VC generalization bounds
IEEE Trans. Neural Networks
(1999) - et al.
Introduction to MPEG-7 Multimedia Content Description Interface
(2002) - E. Izquierdo, I. Damnjanovic, P. Villegas, X. Li-Qun, S. Herrmann, Bringing user satisfaction to media access: the 1st...
- V. Mezaris, H. Doulaverakis, R.M.B. de Otalora, S. Herrmann, I. Kompatsiaris, M.G. Strintzis, A test-bed for...
- et al.
Overview of the MPEG-7 standard
IEEE Trans. Circuits Systems Video Technol.
(2001) - A. Dorado, W. Pedrycz, E. Izquierdo, An MPEG-7 learning space for semantic image classification, Proceedings of the...
- et al.
Color and texture descriptors
IEEE Trans. Circuits Systems Video Technol.
(2001) - A. Barla, F. Odone, A. Verri, Old fashioned state-of-the-art image classification, IEEE Proceedings of International...
Statistical pattern recognition: a review
IEEE Trans. Pattern Anal. Mach. Intell.
Support vector machines for histogram-based image classification
IEEE Trans. Neural Networks
Cited by (97)
Maximizing minority accuracy for imbalanced pattern classification problems using cost-sensitive Localized Generalization Error Model
2021, Applied Soft ComputingCitation Excerpt :This is a challenging task if both the data distribution and the costs may change over time. For instance, image classification problems [47] are usually a multi-class imbalanced problems but most of current methods ignore the imbalance issue in different classes. The application of the c-LGEM to multi-class image classification problem may focus on the very large number of classes issue which leads to a very imbalanced classification problem for each class.
ReDMark: Framework for residual diffusion watermarking based on deep networks
2020, Expert Systems with ApplicationsDesign methodology for Radial Basis Function Neural Networks classifier based on locally linear reconstruction and Conditional Fuzzy C-Means clustering
2019, International Journal of Approximate ReasoningCitation Excerpt :Fuzzy radial basis function neural networks (FRBFNNs) form fuzzy neural networks. FRBFNNs have been used widely in various areas such as system modeling, control, and classification [2,5–13]. FRBFNNs are another type of hybrid system, which stems from the fuzzy inference system and neural networks.
Modeling of CO<inf>2</inf> solubility in MEA, DEA, TEA, and MDEA aqueous solutions using AdaBoost-Decision Tree and Artificial Neural Network
2017, International Journal of Greenhouse Gas ControlCitation Excerpt :SVM is categorized as a supervised method of machine learning. In cases associated with estimation of function, regression analysis, and classification, SVMs are attractive approaches (Jeng, 2006; Wing et al., 2007; Tsai and Sun, 2007; Acevedo-Rodríguez et al., 2009; Ceperic et al., 2012; Huang et al., 2004; Li et al., 2009; Stoean and Stoean, 2013; Subasi and Ismail Gursoy, 2010). Meyer et al. (2003) have compared the SVM to 16 classifiers and 9 regression approaches.
Machine learning techniques for classification of breast tissue
2017, Procedia Computer ScienceSteganalysis classifier training via minimizing sensitivity for different imaging sources
2014, Information Sciences