Fuzzy classification using information theoretic learning vector quantization
Introduction
Prototype based unsupervised vector quantization is an important task in pattern recognition. One basic advantage is the easy mapping scheme and the intuitive understanding by the concept of representative prototypes. Several prototype based methods have been established ranging from statistical approaches to neural vector quantizers [10], [19], [27]. Thereby, close connections to information theoretic learning can be drawn for neural vector quantizer [3], [5], [16], [18], [26], [28]. Based on the fundamental work of Zador, distance based vector quantization can be related to magnification in prototype based vector quantization which describes the relation between data and prototype density as a power law [45]. It can be used to design control strategies such that maximum mutual information between data and prototype density is obtained [1], [6], [39], [42]. However, this goal is achieved by a side effect. It is not directly optimized by the learning schemes because the origin of distance based vector quantization methods is to minimize variants of the description error [45], which, usually, does not optimize any information theoretic criterion. The respective control strategies have to be installed additionally to the usual prototype adaptation scheme, which, however, could generate side-effects, which may be contrary to the original goal of the vector quantizer (for instance, topographic mapping) [42].
Yet, vector quantization, which directly optimizes information theoretic approaches, becomes more and more important [5], [28], [40]. Two basic principles are widely used: maximization of the mutual information and minimization of divergence measures [24], [26]. Both criteria are equivalent for uniformly distributed data. Thereby, several entropy and divergence measures exist. Among the earliest, Shannon-entropy and Kullback–Leibler-divergence, provided the way for the other methods [21], [34]. One famous entropy class is the class of -entropies [29]. These entropies are generalizations of the Shannon-entropy. Introduced by A. Rényi, they show interesting properties, which are of special interest for numerical computation as discussed later in the paper [46]. In particular, the quadratic -entropy plays a distinguished role in this direction. Other divergences measure can be obtained using concepts of functional norms and application of their mathematical properties. J. Principe and colleagues have shown that, based on the Cauchy–Schwarz-inequality for the functional -norm a divergence measure can be derived, which, together with a consistently chosen Parzen-estimator for the unknown data densities, gives a numerically well-behaved approach of information optimum prototype based vector quantization [24].
In this contribution, first we extend this approach of information theoretic vector quantization such that it is applicable to more general functional norms keeping the prototype based principle of vector quantization as basis. Thus a broader range of application becomes possible. For example, data equipped with only pairwise similarity measure become tractable by this more general view. Further, we allow that the used similarity measure or functional norm may be dependent on additional parameters. In this way we obtain greater flexibility by a free choice of the parameters. Moreover, doing so, we are further able to optimize the metric itself parallel to the prototype distribution and, hence, the whole vector quantization model of the given data with respect to these metric parameters. Thus an information processing optimum metric can be achieved.
This strategy of task dependent metric adaptation is known in supervised learning vector quantization as (LVQ) relevance learning.
The main contribution in this paper is, that we extend the original approach of unsupervised information theoretic vector quantization introduced by J. Principe and colleagues to a supervised learning scheme. Thus, we transfer the ideas from the unsupervised information theoretic vector quantization to an information theoretic LVQ approach, which is a classification scheme or, equivalently spoken, a supervised learning scheme. Thereby, we allow the classification information of both data and prototypes to be fuzzy, i.e. we do not assume a crisp class decision for the training data or the adapted prototypes. Finally, we end up with a prototype based fuzzy classifier, which is an improvement in comparison to standard LVQ approaches, which usually provide crisp decisions and are not able to handle fuzzy labels for data or are not based on information theoretic principles.
The paper is organized as follows: First we review the approach of unsupervised information theoretic vector quantization introduced by J. Principe and colleagues, but in the more general variant of arbitrary functional metrics in Hilbert-spaces. Subsequently, we explain the new model for supervised fuzzy classification scheme based on the unsupervised method and show, how metric adaptation (relevance learning) can be integrated. Numerical considerations for artificial and real-world data demonstrate the abilities of the new classification system.
Section snippets
Information theoretic unsupervised vector quantization based on functional norms using the Hölder-inequality
In the following we shortly review the derivation of a numerically well-behaved divergence measure as proposed by J. Principe. It differs in some properties from the well-known Kullback–Leibler-divergence. However, it also vanishes for identical probability densities and, therefore, it can be used in density matching optimization tasks like prototype based vector quantization.
Let us start with the Shannon-entropy in differential form for a density function , If the
Prototype based classification using Cauchy–Schwarz-divergence
In the following we will extend the above approach to the task of prototype based classification. Prototype based classification is a very intuitive and robust method [43]. It includes the LVQ algorithms introduced by Kohonen [19]. However, the LVQ does not follow a gradient of any cost functions. The classification error is reduced only based on heuristics. For overlapping classes this heuristic causes instabilities [31]. Several modifications have been proposed to overcome this problem [31],
Metric adaptation—relevance learning
Up to now, we formulated the algorithm for general difference based distance measures . Usually, the Euclidean distance is applied. However, it is possible to use more complicated difference based distance measures. For example, one can consider an arbitrary, parameterized distance measure with a parameter vector , and . We assume that is continuously differentiable. An important example is the scaled (quadratic) Euclidean metricIn this
Artificial data and exemplary applications
In a first toy example we applied the LVQ-CSD using the quadratic Euclidean distance for to classify data obtained from two two-dimensional overlapping Gaussian distribution, each of them defining a data class. The overall number of data was equally split into test and train data. We used 10 prototypes with randomly initialized positions and fuzzy labels.
One crucial point using Parzen estimators is the adequate choice of the kernel size . Silverman's rule gives a rough estimation [35]
Conclusion
Based on the information theoretic approach of unsupervised vector quantization by density matching using Cauchy–Schwarz-divergence, we developed a new supervised learning vector quantization algorithm, which is able to handle fuzzy labels for data as well as for prototypes. In first toy applications the algorithm shows valuable results. In a realistic medical application we have demonstrated the power and the abilities of the new classification scheme and outlined possible conclusions in the
Thomas Villmann is a senior researcher at the Medical Department, University of Leipzig, Germany and leads the Computational Intelligence group (http://www.unileipzig.de/∼compint/). He holds a Ph.D. and the venia legendi in Computer Science. His research areas comprise the theory of prototype-based vector quantization, neural networks and machine learning as well as respective applications in medical data analysis, bioinformatics and satellite remote sensing. Several research stays have taken
References (46)
- et al.
Competitive learning algorithms for vector quantization
Neural Networks
(1990) - et al.
Generalized relevance learning vector quantization
Neural Networks
(2002) - et al.
A symmetric information divergence measure of the Csiszár's -divergence class and its bounds
Comput. Math. Appl.
(2005) - et al.
Relative information of type s, Csiszár's -divergence, and information inequalities
Inf. Sci.
(2004) - et al.
Prototype-based fuzzy classification with local relevance for proteomics
Neurocomputing
(2006) - et al.
Comparison of relevance learning vector quantization with other metric adaptive classification methods
Neural Networks
(2006) - C.L. Blake, C.J. Merz, UCI repository of machine learning databases, University of California, Department of...
Neuronale Netze
(1995)Information-type measures of differences of probability distributions and indirect observations
Studia Sci. Math. Hungaria
(1967)- et al.
An Information-Theoretic Approach to Neural Computing
(1997)
Supervised neural gas with general similarity measure
Neural Process. Lett.
Neural Networks—A Comprehensive Foundation
Comparison of clinical types of Wilson's disease and glucose metabolism in extrapyramidal motor brain regions
J. Neurol.
Correlation between automated writing movements and striatal dopaminergic innervation in patients with Wilson's disease
J. Neurol.
Pyramidale Schädigung im Vergleich zur extrapyramidalmotorischen Beeinträchtigung bei Patienten mit Morbus Wilson
Klin. Neurophysiol.
Computergestützte Analyse der Handschrift bei Patienten mit Morbus Wilson
Klin. Neurophysiol.
Classification of fine-motoric disturbances in Wilson's disease using artificial neural networks
Acta Neurol. Scand.
Statistical pattern recognition: a review
IEEE Trans. Pattern Anal. Mach. Intell.
Measures of Information and their Application
Cited by (0)
Thomas Villmann is a senior researcher at the Medical Department, University of Leipzig, Germany and leads the Computational Intelligence group (http://www.unileipzig.de/∼compint/). He holds a Ph.D. and the venia legendi in Computer Science. His research areas comprise the theory of prototype-based vector quantization, neural networks and machine learning as well as respective applications in medical data analysis, bioinformatics and satellite remote sensing. Several research stays have taken him to Belgium, France, the Netherlands, and the USA. He is a founding member of the German chapter of the European Neural Network Society (GNNS).
Barbara Hammer received her Ph.D. in Computer Science in 1995 and her venia legendi in Computer Science in 2003, both from the University of Osnabrueck, Germany. From 2000 to 2004, she was leader of the junior research group ‘Learning with Neural Methods on Structured Data’ at University of Osnabrueck. In 2004, she became Professor for Theoretical Computer Science at Clausthal University of Technology, Germany. Several research stays have taken her to Italy, UK, India, France, and the USA. Her areas of expertise include hybrid systems, self-organizing maps, clustering, recurrent networks and their in bioinformatics, industrial process monitoring, or cognitive science.
Frank-Michael Schleif studied Computer Science at Leipzig University, graduating in 2002. He then became a Ph.D. student at Leipzig University. In 2003, he joined Bruker Biosciences and continued his Ph.D. studies at the Clausthal University of Technology. In 2006, he received a Ph.D. in Computer Science and was awarded as the best C.S. Ph.D. Thesis at TUC. He is currently a research scientist in the MetaSTEM project team at IZKF (Interdisciplinary Center for Clinical Research) and a member of the Computational Intelligence Group at the Medical Department of the Leipzig University. His research activities focus on machine learning methods, statistical data analysis and algorithm development.
Wieland Hermann is a neurologist and head of the neurology department at the Paracelsus Hospital, Zwickau. He holds a doctor degree and a venia legendi both from Leipzig University. His research area includes extra-pyramidal symptoms, Wilson's disease as well as related topics.
Marie Cottrell was born in Béthune, France in 1943. She was a student at the Ecole Normale Supérieure de Sèvres, and received the Agrégation de Mathématiques degree in 1964 (with place), and the Thèse d’Etat (Modélisations de réseaux de neurones par des chaıˆnes de Markov et autres applications.) in 1988. From 1964 to 1967, she was a high school teacher. From 1967 to 1988, she was successively an assistant and an assistant professor at the University of Paris and at the University of Paris-Sud (Orsay), except from 1970 to 1973, on which she was a professor at the University of Havana, Cuba. From 1989, she is a full professor at the University Paris 1-Panthéon-Sorbonne. Her research interests include stochastic algorithms, large deviation theory, biomathematics, data analysis, statistics. Since 1986, her main work deals with artificial and biological neural networks, Kohonen maps and their applications in data analysis. She is the author of about 70 publications in this field. She is in charge of a Research Group at the University Paris 1 (the SAMOS). She is regularly solicited as referee or international conference program committee member.