Adapting k-means for supervised clustering

Al-Harbi, S. H.; Rayward-Smith, V. J.

doi:10.1007/s10489-006-8513-8

Adapting k-means for supervised clustering

Published: June 2006

Volume 24, pages 219–226, (2006)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

S. H. Al-Harbi¹ &
V. J. Rayward-Smith²

1509 Accesses
53 Citations
6 Altmetric
Explore all metrics

Abstract

k-means is traditionally viewed as an algorithm for the unsupervised clustering of a heterogeneous population into a number of more homogeneous groups of objects. However, it is not necessarily guaranteed to group the same types (classes) of objects together. In such cases, some supervision is needed to partition objects which have the same label into one cluster. This paper demonstrates how the popular k-means clustering algorithm can be profitably modified to be used as a classifier algorithm. The output field itself cannot be used in the clustering but it is used in developing a suitable metric defined on other fields. The proposed algorithm combines Simulated Annealing with the modified k-means algorithm. We apply the proposed algorithm to real data sets, and compare the output of the resultant classifier to that of C4.5.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Al-Harbi S, Rayward-Smith VJ (2003) The Use of a Supervised k-means Algorithm on Real-Valued Data with Applications in Health. In: Chung PWH, Chris H, Ali Moois (ed) Developments in Applied Artificial Intelligence LNAI 2718. Springer-Verlag, UK, Loughborough, pp 575–581
Google Scholar
Ayan NF (1999) Using Information Gain as Feature Weight. In: Proceedings of the 8th Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN/99), Turkey
Basu S, Banerjee A, Mooney R (2002) Semi-supervised Clustering by Seeding.In: Proceedings of the 19th International Conference on Machine Learning (ICML-2002), Sydney, Australia
Berry M, Linoff G (1997) Data Mining Techniques for Marketing, Sales, and Customer Support. John Wiley and Sons, New York
Google Scholar
Berson A, Thearling K, Smith S (1999) Building Data Mining Applications for CRM. McGraw-Hill ProfessionalPublishing
Breiman L (2000) Randomizing Outputs To Increase Prediction Accuracy.J Mach Learn 40(3):229–242
MATH Google Scholar
Brittain D (1999) Optimisation of the Telecommunication Access Network.University of Bristol, UK, PhD thesis
Burgess M, Janacek G, Rayward-Smith VJ (2003) Handling Categorical Data in Rule Induction. In: Proceedings ICANNGA Conference 2003, D.W. Pearson et al. (eds) Springer Computer Science, Wien and New York, pp 249–255
Google Scholar
Cohn D, Caruana R, McCallum A (2003) Semi-supervised Clustering with User Feedback. In http://cs.citeseer.nj.nec.com/387862.html
Copson ET (1968) Metric spaces. Cambridge University Press
Demiriz A, Bennett KP (2002) A genetic Approach for Semi-Supervised Clustering. Rensselaer Polytechnic Institute, R.P.I. Math Report No. 9901: Troy, New York
Dietterich TG (2000) An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization. J Mach Learn 40(2):139–157
Google Scholar
Everitt B (1974) Cluster analysis. Social Science Research Council
Hartigan J (1975) Clustering algorithms. John Wiley and Sons Inc
Huang Z (1997) Clustering Large Data Sets with Mixed Numeric and Categorical Values. In: Proceedings of the First Pacific-Asia Conference on Knowledge Discovery and Data Mining
Lanner Group Inc. (2003) Data Lamp Version 2.02: Technology for knowing. In http://www.lanner.com
Jourdan L, Dhaenens C, Talbi E-G (2003) CHyGA: A New Distance Based Hybrid Genetic Algorithm. The Journal of Mathematical Modelling and Algorithms (JMMA), Rayward-Smith VJ (ed) (Submitted)
Kaufman L, Rousseeuw P (1990) Finding groups IN DATA: An Introduction to Cluster Analysis. John Wiley and Sons Inc
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceeding of the 5th Berkeley symposium 1:281–297
MATH MathSciNet Google Scholar
Nukoolkit C, Chen H (2001) A Data Transformation Technique for Car Injury Prediction. Technical report, University of Alabama, USA
Google Scholar
Quinlan JR (1996) Improved use of Continuous Attributes in C4.5. J Arti Intell Res 4:77–90
MATH Google Scholar
Rayward-Smith VJ, Osman IH, Reeves CR, Smith GD (1996) Modern Heuristic Search Methods. John Wiley and Sons Ltd
Schlimmer JC (2003) Auto imports Database. In http://www.icu.uci.edu/pub/machine-learning-data-bases. UCI repository of machine learning databases
Sigillito V (2003) National Institute of Diabetes and Digestive and Kidney Diseases. In http://www.icu.uci.edu/pub/machine-learning-data-bases. UCI repository of machine learning databases
National Indonesia Contraceptive Prevalence Survey (2003) Contraceptive Method Choice Data Set. In http://www.icu.uci.edu/pub/machine-learning-data-bases. UCI repository of machine learning databases
Wagstaff K, Rogers S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the 18th International conference on Machine Learning (ICML-2001), pp 577–584
Wolberg WH, Mangasarian OL (2003) Pattern Separation for Medical Diagnosis Applied to Breast Cytology. In http://www.icu.uci.edu/pub/machine-learning-data-bases. UCI repository of machine learning databases

Download references

Author information

Authors and Affiliations

Information Center, P O Box 21883, Riyadh, 11485, Saudi Arabia
S. H. Al-Harbi
School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, England
V. J. Rayward-Smith

Authors

S. H. Al-Harbi
View author publications
You can also search for this author in PubMed Google Scholar
V. J. Rayward-Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. H. Al-Harbi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Harbi, S.H., Rayward-Smith, V.J. Adapting k-means for supervised clustering. Appl Intell 24, 219–226 (2006). https://doi.org/10.1007/s10489-006-8513-8

Download citation

Issue Date: June 2006
DOI: https://doi.org/10.1007/s10489-006-8513-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adapting k-means for supervised clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adapting k-means for supervised clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation