The effect of finite sample size on on-line K-means
Introduction
On-line algorithms are among the simplest optimization processes in the learning phase of artificial adaptive systems. This feature makes them attractive for handling large (e.g. real-world) training sets using moderate computational effort. In these cases, since the number of examples (N) is usually fixed, the learning algorithm stores the complete training data in memory and passes (e.g. cyclically) through them over and over until a stopping criterion is met.
In some cases this learning scenario is unavoidable since the derivation of the algorithm is intrinsically on-line (e.g. supervised LVQ algorithms [12]). In other cases, since the algorithm is related to a particular cost function, its use is supported by the empirical evidence that the convergence speed is accelerated over batch versions if the data is redundant, a typical situation in real-life data [3], [4]. Another typical argument in favor of on-line versions is that they can be considered as ‘noisy’ versions of batch algorithms, so they could escape from a local minimum more easily.
Although the use of on-line learning algorithms is a common practice when there is a cyclic or random access to a fixed set of examples, there is no guarantee of convergence. From a theoretical point of view, convergence is only guaranteed when N tends to infinity [15]. (This holds both in the case of using a constant and decreasing step size [3].) Moreover, the theory that studies these systems does not usually provide any hints about the practical convergence rate. Hence, to get more insight into the finite-sample properties of these algorithms, the theoretic analysis must always be complemented by simulation studies.
This paper addresses the study of the finite-sample convergence of the popular clustering K-means algorithm [16]. This algorithm is widely employed in vector quantization [10] as initialization for other more powerful learning systems (like Radial Basis functions [17] or Kohonen's LVQ algorithms [12]), and also as a learning algorithm for data-dependent partitioning classifiers [9]. We study this on-line learning when it uses a cyclic or any other presentation of the training data and a constant or variable step size. Emphasis is placed on comparing the cyclic on-line version with batch and infinite-sample on-line versions. For the study of the learning algorithm we will make a non-statistical analysis, which is mainly based on the use of the discrete-time dynamical systems theory [8], [18], [19].
The organization of this paper is as follows. Section 2 reviews on-line and batch versions of the K-means algorithm. In Section 3, we present our study of the finite-sample convergence introducing an asymptotic first-order model of the on-line K-means. To give a complete view, we also include convergence studies of the other versions. Section 4 briefly describes the generalization performance of these algorithms. Next, experimental results are presented in order to validate the proposed model of the finite-sample on-line K-means. Finally, we include some discussion and conclusions.
Section snippets
Optimal K-means
We wish to design a codebook (or set of prototypes of size K) for a vector quantizer VQ. A VQ of dimension p and size K is defined as a mapping from a p-dimensional Euclidean space, , into a set or codebook . Associated with every code vector there is a region of influence Rj where VQ maps any input vector that falls into it to . Since we use a nearest neighbor quantizer, Rj is defined byThus, can be expressed as
Study of convergence
K-means works in input regions of high probability (i.e. regions with high density of input patterns) and places codevectors to approximate discretely the probability density or the empirical density of samples observed in the training set if the real density is unknown.
In this section, we will study the convergence of the three versions of K-means, presented in Section 2, and what the relations between these three solutions are. More precisely, we will present a new study of the convergence of
Generalization properties
The K-means learning model approximates locally RV bywhere is the nearest codevector to .
The expected quantization error will measure how well we approximate for all possible cases,It is easy to show that this error function can be decomposed as the sum of an approximation error and an estimation error,The approximation error is the error induced by the kind
Experimental results
In the experimental part of our work, we will study the real convergence properties of the on-line K-means algorithm using artificial data to see how good the linearized model of K-means near the basin of attraction is. Since the most simple and intuitive expressions have been derived for constant step size and cyclic sampling of the training set, we will only perform simulations in this particular case.
Training data were sampled from a 2-dimensional normal distribution with mean and the
On-line K-means for constant step size and cyclic sampling
The step size α affects the fixed points and their stability considerably. But it is the relation between of α and Nj (the number of training points that are used to compute the fixed point ) that determines the behavior and value of . If this ratio is small enough (e.g. <2), the fixed points are stable and their values tend to be the empirical estimators of their counterpart optimal points. Otherwise, the fixed points tend to move away slightly from the estimators. Each training example
Conclusions
General expressions of the finite-sample convergence of the on-line K-means algorithm have been presented where the fixed points of the K-means are average weighted conditioned means that depend on the training data, the step size function and the method of sampling. In particular, we have derived a closed formula for cyclic presentation and constant step size using a linear model which is valid near the attraction basis of the non-linear discrete-time dynamical system. In fact, we have
Acknowledgements
The authors acknowledge the valuable comments of the reviewers on a previous version of this paper, which helped to improve the presentation of the mathematics. This research was supported in part by Spanish CICYT action TIC96-0889.
Sergio Bermejo received his M.Sc. Degree and Ph.D. Degrees in Telecommunications Engineering in 1996 and 2000, respectively, from the Universitat Politècnica de Catalunya (UPC). He holds an Assistant Professor position at UPC's Department of Electronic Engineering. His research interests include Statistical Pattern Recognition and Machine Learning.
References (19)
- et al.
Finite-sample convergence of on-line LVQ1 and the BLVQ1 algorithm
Neural Process Lett.
(2001) - Y. Bengio, Artificial neural networks and their application to sequence recognition, Ph.D. Thesis. Department of...
- et al.
Adaptive Algorithms and Stochastic Approximations
(1990) Neural Networks for Pattern Recognition
(1995)- et al.
Convergence Properties of k-means
Online Learning and Stochastic Approximations
- et al.
A view of unconstrained optimization
An introduction to Chaotic Dynamical Systems
(1989)- et al.
A Probabilistic Theory of Pattern Recognition
(1996)
Cited by (11)
The incremental online k-means clustering algorithm and its application to color quantization
2022, Expert Systems with ApplicationsCitation Excerpt :Note that, unlike BKM, OKM traverses the data points in random order, which aims to reduce OKM’s sensitivity to the order in which the data points are processed. Studies have shown that for online learning algorithms like OKM, random traversal is preferable to cyclical traversal, which is used in BKM (Bermejo & Cabestany, 2002). This is because cyclical presentation may bias an online learning algorithm, especially when dealing with redundant data sets such as image data.
Smart motion detection sensor based on video processing using self-organizing maps
2016, Expert Systems with ApplicationsCitation Excerpt :In particular, it has been applied to several areas of computer vision, such as color quantization (Dekker, 1994; Palomo & Domínguez, 2014; Papamarkos, 1999; Xiao, Leung, Lam, & Ho, 2012), and image segmentation (Bhandarkar, Koh, & Suk, 1997; Dong & Xie, 2005; Lacerda & Mello, 2013; Maddalena & Petrosino, 2008a). The SOM is based on an incremental (online) learning process, which has better ability to escape from local minima than batch learning (Bermejo & Cabestany, 2002) and consumes less computational time in color quantization problems (Chang, Pengfei, Xiao, & Srikanthan, 2005). Moreover, it has been employed previously to detect foreground objects in video sequences (López-Rubio, Luque-Baena, & Domínguez, 2011; Maddalena & Petrosino, 2008a).
Neural networks: An overview of early research, current frameworks and new challenges
2016, NeurocomputingCitation Excerpt :Several excellent books dedicated to neural networks and machine learning were published in this period, such as those by Haykin [82] and Luo and Unbehauen [83]. In the fourth and last period, which began in approximately 2000 and continues until now, no models have become so popular and aroused such interest as those produced in previous phases, nevertheless the theoretical study of previous models has notably deepened, with exhaustive studies into topics such as convergence analysis, statistical equilibrium, stability [84–88], estimation of states and control of synchronization, aiming to optimize and improve the models [89–95]. The quantitative analysis of neural networks with discontinuous activation functions was also a hot topic in this period [96–99].
Sample-size adaptive self-organization map for color images quantization
2007, Pattern Recognition LettersForty years of color quantization: a modern, algorithmic survey
2023, Artificial Intelligence ReviewFast color quantization using MacQueen’s k-means algorithm
2020, Journal of Real-Time Image Processing
Sergio Bermejo received his M.Sc. Degree and Ph.D. Degrees in Telecommunications Engineering in 1996 and 2000, respectively, from the Universitat Politècnica de Catalunya (UPC). He holds an Assistant Professor position at UPC's Department of Electronic Engineering. His research interests include Statistical Pattern Recognition and Machine Learning.
Joan Cabestany currently holds a Professor position at the Department of Electronic Engineering of the Universitat Politècnica de Catalunya (UPC). He obtained his M.Sc. Degree and Ph.D. Degrees in Telecommunications Engineering in 1976 and 1982, respectively, both from the Universitat Politècnica de Catalunya. His research interests include Analog and Digital Electronic Systems Design, Configurable and Programmable Electronic Systems, and Neural Networks Models and applications.