Abstract
This paper describes a new approach to semi-supervised model-based clustering. The problem is formulated as penalized logistic regression, where the labels are only indirectly observed (via the component densities). This formulation allows deriving a generalized EM algorithm with closed-form update equations, which is in contrast with other related approaches which require expensive Gibbs sampling or suboptimal algorithms. We show how this approach can be naturally used for image segmentation under spatial priors, avoiding the usual hard combinatorial optimization required by classical Markov random fields; this opens the door to the use of sophisticated spatial priors (such as those based on wavelet representations) in a simple and computationally very efficient way.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BALRAM, N. and MOURA, J. (1993): Noncausal Gauss-Markov Random Fields: Parameter Structure and Estimation. IEEE Transactions on Information Theory, 39, 1333–1355.
BANERJEE, A., MERUGU. S., DHILLON, I. and GHOSH, J. (2004): Clustering With Bregman Divergences. Proc. SIAM International Conference on Data Mining, Lake Buena Vista.
BASU, S., BILENKO, M. and MOONEY, R. (2004): A Probabilistic Framework for Semi-supervised Clustering. Proc. International Conference on Knowledge Discovery and Data Mining, Seattle.
BELKIN, M. and NIYOGI, P. (2003): Using Manifold Structure for Partially Labelled Classification. Proc. Neural Information Processing Systems 15, MIT Press, Cambridge.
BÖHNING, D. (1992): Multinomial Logistic Regression Algorithm. Annals of the Institute of Statistical Mathematics, 44, 197–200.
CEBRON, N. and BERTHOLD, M. (2006): Mining of Cell Assay Images Using Active Semi-supervised Clustering. Proc. Workshop on Computational Intelligence in Data Mining, Houston.
FIGUEIREDO, M. (2005): Bayesian Image Segmentation Using Wavelet-based Priors. Proc. IEEE Conference on Computer Vision and Pattern Recognition, San Diego.
GRIRA, N., CRUCIANU, M. and BOUJEMAA, N. (2005): Active and Semi-supervised Clustering for Image Database Categorization. Proc. IEEE/EURASIP Workshop on Content Based Multimedia Indexing, Riga, Latvia.
HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. (2001): The Elements of Statistical Learning. Springer, New York.
KRISHNAPURAM, B., WILLIAMS, D., XUE, Y., HARTEMINK, A., CARIN, L. and FIGUEIREDO, M. (2005): On Semi-supervised Classification. Proc. Neural Information Processing Systems 17, MIT Press, Cambridge.
LANGE, K., HUNTER, D. and YANG, I. (2000): Optimization Transfer Using Surrogate Objective Functions. Jour. Computational and Graphical Statistics, 9, 1–59.
LAW, M., TOPCHY, A. and JAIN, A. K. (2005): Model-based Clustering With Probabilistic Constraints. Proc. SIAM Conference on Data Mining, Newport Beach.
LI, S. (2001): Markov Random Field Modelling in Computer Vision, Springer, Tokyo.
LU, Z. and LEEN, T. (2005): Probabilistic Penalized Clustering. Proc. Neural Information Processing Systems 17, MIT Press, Cambridge.
MALLAT, S. (1998): A Wavelet Tour of Signal Processing. Academic Press, San Diego, USA.
MCLACHLAN, G. and KRISHNAN, T. (1997): The EM Algorithm and Extensions. Wiley, New York.
MOULIN, P. and LIU, J. (1999): Analysis of Multiresolution Image Denoising Schemes Using Generalized-Gaussian and Ccomplexity Priors. IEEE Transactions on Information Theory, 45, 909–919.
NIKKILÄ, J., TÖRÖNEN, P., SINKKONEN, J. and KASKI, S. (2001): Analysis of Gene Expression Data Using Semi-supervised Clustering. Proc. Bioinformatics 2001, Skövde.
SEEGER, M. (2001): Learning With Labelled and Unlabelled Data. Technical Report, Institute for Adaptive and Neural Computation, University of Edinburgh.
SHENTAL, N., BAR-HILLEL, A., HERTZ, T. and WEINSHALL, D. (2003): Computing Gaussian Mixture Models With EM Using Equivalence Constraints. Proc. Neural Information Processing Systems 15, MIT Press, Cambridge.
WAGSTAFF, K., CARDIE, C., ROGERS, S. and SCHRÖDL, S. (2001): Constrained K-means Clustering With Background Knowledge. Proc. International Conference on Machine Learning, Williamstown.
WU, C. (1983): On the Convergence Properties of the EM Algorithm. Annals of Statistics, 11, 95–103.
ZHONG, S. (2006): Semi-supervised Model-based Document Clustering: A Comparative Study. Machine Lerning, 2006 (in press).
ZHU, X. (2006): Semi-Supervised Learning Literature Survey. Technical Report, Computer Sciences Department, University of Wisconsin, Madison.
ZHU, X., GHAHRAMANI, Z. and LAFFERTY, J. (2003): Semi-supervised Learning Using Gaussian Fields and Harmonic Functions. Proc. International Conference on Machine Learning, Washington DC.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Figueiredo, M.A.T. (2007). Semi-Supervised Clustering: Application to Image Segmentation. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-70981-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)