Abstract
k-means is traditionally viewed as an unsupervised algorithm for the clustering of a heterogeneous population into a number of more homogeneous groups of objects. However, it is not necessarily guaranteed to group the same types (classes) of objects together. In such cases, some supervision is needed to partition objects which have the same class label into one cluster. This paper demonstrates how the popular k-means clustering algorithm can be profitably modified to be used as a classifier algorithm. The output field itself cannot be used in the clustering but it is used in developing a suitable metric defined on other fields. The proposed algorithm combines Simulated Annealing and the modified k-means algorithm. We also apply the proposed algorithm to real data sets, which result in improvements in confidence when compared to C4.5.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ayan, N. F.: Using Information Gain as Feature Weight. 8th Turkish Symposium on Artificial Intelligence and Neural Networks. (1999)
Brittain, D.: Optimisation of the Telecommunication Access Network. Bristol, UK: University of Bristol (1999)
Copson, E. T.: Metric Spaces. Cambridge University Press (1968)
Everitt, B.: Cluster Analysis. Social Science Research Council (1974)
Hartigan, J.: Clustering Algorithms. John Wiley and Sons Inc (1975)
Huang, Z.: Clustering Large Data Sets with Mixed Numberic and Categorical Values. Proceedings of The First Pacific-Asia Conference on Knowledge Discovery and Data Mining (1997)
Lanner Group Inc.: Data Lamp Version 2.02: Technology for knowing. http://www.lanner.com.
MacQueen, J.: Some methods for classification and analysis of multivariate observations. Proceeding of the 5th Berkeley Symposium. (1967) 281–297
Rayward-Smith V. J., Osman I. H., Reeves C. R. and Smith G. D.: Modern Heuristic Search Methods. John Wiley and Sons Ltd. (1996)
Sigillito V.: National Institiute of Diabetes and Digestive and Kidney Diseases. http://www.icu.uci.edu/pub/machine-learning-data-bases. UCI repository of machine learining databases.
William H. Wolberg and O.L. Mangasarian.: pattern separation for medical diagnosis applied to breast cytology. http://www.icu.uci.edu/pub/machine-learning-databases. UCI repository of machine learining databases.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Al-Harbi, S.H., Rayward-Smith, V.J. (2003). The Use of a Supervised k-Means Algorithm on Real-Valued Data with Applications in Health. In: Chung, P.W.H., Hinde, C., Ali, M. (eds) Developments in Applied Artificial Intelligence. IEA/AIE 2003. Lecture Notes in Computer Science(), vol 2718. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45034-3_58
Download citation
DOI: https://doi.org/10.1007/3-540-45034-3_58
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40455-2
Online ISBN: 978-3-540-45034-4
eBook Packages: Springer Book Archive