Supervised Pre-processings Are Useful for Supervised Clustering

Ismaili, Oumaima Alaoui; Lemaire, Vincent; Cornuéjols, Antoine

doi:10.1007/978-3-319-25226-1_13

Oumaima Alaoui Ismaili^20,21,
Vincent Lemaire²⁰ &
Antoine Cornuéjols²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2219 Accesses

Abstract

Over the last years, researchers have focused their attention on a new approach, supervised clustering, that combines the main characteristics of both traditional clustering and supervised classification tasks. Motivated by the importance of pre-processing approaches in the traditional clustering context, this paper explores to what extent supervised pre-processing steps could help traditional clustering to obtain better performance on supervised clustering tasks. This paper reports experiments which show that indeed standard clustering algorithms are competitive compared to existing supervised clustering algorithms when supervised pre-processing steps are carried out.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aguilar-Ruiz, J. S., Ruiz, R., Santos, J. C. R., & Girldez, R. (2001). SNN: A supervised clustering algorithm. In L. Monostori, J. Vncza, & M. Ali (Eds.), IEA/AIE. Lecture Notes in Computer Science (Vol. 2070, pp. 207–216). Heidelberg: Springer.
Google Scholar
al-Harbi, S. H., & Rayward-Smith, V. J. (2006). Adapting k-means for supervised clustering. Journal of Applied Intelligence, 24(3), 219–226.
Google Scholar
Arthur, D., & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’07 (pp. 1027–1035).
Google Scholar
Berry, M., & Linoff, G. (1997). Data mining techniques for marketing, sales, and customer support. New York: Wiley.
Google Scholar
Berson, A., Smith, S., & Thearling, K. (2000). Building data mining applications for CRM. New York: McGraw-Hill.
MATH Google Scholar
Boullé, M. (2005). A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research, 6, 1431–1452.
MathSciNet MATH Google Scholar
Boullé, M. (2006). MODL: A Bayes optimal discretization method for continuous attributes. Journal of Machine Learning, 65(1), 131–165.
Article Google Scholar
Bungkomkhun, P. (2012). Grid-based supervised clustering algorithm using greedy and gradient descent methods to build clusters. In National Institute of Development Administration. http://libdcms.nida.ac.th/thesis6/2012/b175320.pdf.
Celebi, E. M., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Journal of Expert Systems with Applications, 40(1), 200–210.
Article Google Scholar
Eick, C. F., Zeidat, N., & Zhao, Z. (2004). Supervised clustering algorithms and benefits. In 16th IEEE International Conference on Tools with Artificial Intelligence, 2004. ICTAI 2004, Boca Raton (pp. 774–776).
Google Scholar
Finley, T., & Joachims, T. (2005). Supervised clustering with support vector machines. In Proceedings of the 22nd International Conference on Machine Learning. ICML ’05 (pp. 217–224). New York, NY: ACM.
Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Google Scholar
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.
Article Google Scholar
Lemaire, V., Clérot, F., & Creff, N. (2012). K-means clustering on a classifier-induced representation space: Application to customer contact personalization. Real-world data mining applications. Annals of Information Systems (pp. 139–153). Cham: Springer.
Google Scholar
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/]. Irvine, CA: University of California, School of Information and Computer Science.
Macqueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In L. M. L. Cam & J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Berkeley, CA: University of California Press.
Google Scholar
Milligan, G., & Cooper, M. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5(2), 181–204.
Article MathSciNet Google Scholar
Qu, Y., & Xu, S. (2004). Supervised cluster analysis for microarray data based on multivariate gaussian mixture. Journal of Bioinformatics, 20(12), 1905–1913.
Article Google Scholar
Sinkkonen, J., Kaski, S., & Nikkil, J. (2002). Discriminative clustering: Optimal contingency tables by learning metrics. Machine learning: ECML 2002 (Vol. 2430, pp. 418–430). Heidelberg: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Orange Labs, AV. Pierre Marzin, 22307, Lannion Cedex, France
Oumaima Alaoui Ismaili & Vincent Lemaire
AgroParisTech 16, rue Claude Bernard, 75005, Paris, France
Oumaima Alaoui Ismaili & Antoine Cornuéjols

Authors

Oumaima Alaoui Ismaili
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Lemaire
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Cornuéjols
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Jacobs University Bremen , Bremen, Germany
Adalbert F.X. Wilhelm
Universität Ulm, Institute of Medical Systems Biology Universität Ulm, Ulm, Baden-Württemberg, Germany
Hans A. Kestler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ismaili, O.A., Lemaire, V., Cornuéjols, A. (2016). Supervised Pre-processings Are Useful for Supervised Clustering. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-25226-1_13
Published: 04 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25224-7
Online ISBN: 978-3-319-25226-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics