Abstract
Over the last years, researchers have focused their attention on a new approach, supervised clustering, that combines the main characteristics of both traditional clustering and supervised classification tasks. Motivated by the importance of pre-processing approaches in the traditional clustering context, this paper explores to what extent supervised pre-processing steps could help traditional clustering to obtain better performance on supervised clustering tasks. This paper reports experiments which show that indeed standard clustering algorithms are competitive compared to existing supervised clustering algorithms when supervised pre-processing steps are carried out.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aguilar-Ruiz, J. S., Ruiz, R., Santos, J. C. R., & Girldez, R. (2001). SNN: A supervised clustering algorithm. In L. Monostori, J. Vncza, & M. Ali (Eds.), IEA/AIE. Lecture Notes in Computer Science (Vol. 2070, pp. 207–216). Heidelberg: Springer.
al-Harbi, S. H., & Rayward-Smith, V. J. (2006). Adapting k-means for supervised clustering. Journal of Applied Intelligence, 24(3), 219–226.
Arthur, D., & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’07 (pp. 1027–1035).
Berry, M., & Linoff, G. (1997). Data mining techniques for marketing, sales, and customer support. New York: Wiley.
Berson, A., Smith, S., & Thearling, K. (2000). Building data mining applications for CRM. New York: McGraw-Hill.
Boullé, M. (2005). A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research, 6, 1431–1452.
Boullé, M. (2006). MODL: A Bayes optimal discretization method for continuous attributes. Journal of Machine Learning, 65(1), 131–165.
Bungkomkhun, P. (2012). Grid-based supervised clustering algorithm using greedy and gradient descent methods to build clusters. In National Institute of Development Administration. http://libdcms.nida.ac.th/thesis6/2012/b175320.pdf.
Celebi, E. M., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Journal of Expert Systems with Applications, 40(1), 200–210.
Eick, C. F., Zeidat, N., & Zhao, Z. (2004). Supervised clustering algorithms and benefits. In 16th IEEE International Conference on Tools with Artificial Intelligence, 2004. ICTAI 2004, Boca Raton (pp. 774–776).
Finley, T., & Joachims, T. (2005). Supervised clustering with support vector machines. In Proceedings of the 22nd International Conference on Machine Learning. ICML ’05 (pp. 217–224). New York, NY: ACM.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.
Lemaire, V., Clérot, F., & Creff, N. (2012). K-means clustering on a classifier-induced representation space: Application to customer contact personalization. Real-world data mining applications. Annals of Information Systems (pp. 139–153). Cham: Springer.
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/]. Irvine, CA: University of California, School of Information and Computer Science.
Macqueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In L. M. L. Cam & J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Berkeley, CA: University of California Press.
Milligan, G., & Cooper, M. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5(2), 181–204.
Qu, Y., & Xu, S. (2004). Supervised cluster analysis for microarray data based on multivariate gaussian mixture. Journal of Bioinformatics, 20(12), 1905–1913.
Sinkkonen, J., Kaski, S., & Nikkil, J. (2002). Discriminative clustering: Optimal contingency tables by learning metrics. Machine learning: ECML 2002 (Vol. 2430, pp. 418–430). Heidelberg: Springer.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ismaili, O.A., Lemaire, V., Cornuéjols, A. (2016). Supervised Pre-processings Are Useful for Supervised Clustering. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-25226-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25224-7
Online ISBN: 978-3-319-25226-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)