Abstract
Many problems in data analysis, especially in signal and image processing, require the unsupervised partitioning of data into a set of ‘self-similar’ classes or clusters. An ideal partitioning unambiguously assigns each datum to a single class and one thinks of the data as being generated by a number of data generators, one for each class. Many algorithms have been proposed for such analysis and for the estimation of the optimal number of partitions. The majority of popular and computationally feasible techniques rely on assuming that classes are hyper-ellipsoidal in shape. In the case of Gaussian mixture modelling [15,6] this is explicit; in the case of dendogram linkage methods (which typically rely on the L 2 norm) it is implicit [9]. For some data sets this leads to an over-partitioning. Alternative methods, based on valley seeking [6] or maxima-tracking in scale-space [16,18,13] for example, have the advantage that they are free from such assumptions. They can be, however, sensitive to noise and computationally intensive in high-dimensional spaces. In this paper we re-consider the issue of data partitioning from an information-theoretic viewpoint and show that minimisation of partition entropy may be used to evaluate the most probable set of data generators. Rather than formulate the problem as one of a traditional model-order estimation to infer the most probable number of classes we employ a reversible jump mechanism in a Markov-chain Monte Carlo (MCMC) sampler which explores the space of different model sizes.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
S. Aeberhard, D. Coomans, and O. de Vel. Comparative-Analysis of Statistical Pattern-Recognition Methods in High-Dimensional Settings. Pattern Recognition, 27(8):1065–1077, 1994.
C. Andrieu, N. de Freitas, and A. Doucet. Sequential MCMC for Bayesian Model Selection. IEEE Signal Processing Workshop on Higher Order Statistics. Ceasarea, Israel, June 14–16., 1999.
J.M. Bernardo and A.F.M. Smith. Bayesian Theory. John Wiley, 1994.
C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, 1995.
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Roy. Stat. Soc., 39(1):1–38, 1977.
K. Fukunaga. An Introduction to Statistical Pattern Recognition. Academic Press, 1990.
P. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82:711–732, 1995.
C. Holmes and B.K. Mallick. Bayesian Radial Basis Functions of variable dimension. Neural Computation, 10:1217–1233, 1998.
A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorisation. Nature, 401:788–791, October 1999.
R.M. Neal. Bayesian learning for neural networks. Lecture notes in statistics. Springer, Berlin, 1996.
S. Richardson and P.J. Green. On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society (Series B), 59(4):731–758, 1997.
S.J. Roberts. Parametric and non-parametric unsupervised cluster analysis. Pattern Recognition, 30(2):261–272, 1997.
S.J. Roberts, R. Everson, and I. Rezek. Maximum Certainty Data Partitioning. Pattern Recognition, 33(5):833–839, 2000.
S.J. Roberts, D. Husmeier, I. Rezek, and W. Penny. Bayesian Approaches To Mixture Modelling. IEEE Transaction on Pattern Analysis and Machine Intelligence, 20(11):1133–1142, 1998.
K. Rose, E. Gurewitz, and G.C. Foz. A Deterministic Annealing Approach to Clustering. Pattern Recognition Letters, 11(9):589–594, September 1990.
L. Tierney. Markov Chains for exploring Posterior Distributions. Annals of Statistics, 22:1701–1762, 1994.
R. Wilson and M. Spann. A New Approach to Clustering. Pattern Recognition, 23(12):1413–1425, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Roberts, S.J., Holmes, C., Denison, D. (2001). Minimum-Entropy Data Clustering Using Reversible Jump Markov Chain Monte Carlo. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_15
Download citation
DOI: https://doi.org/10.1007/3-540-44668-0_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive