Abstract
In this work, we address the hard clustering problem. We present a new clustering algorithm based on evolutionary computation searching a best partition with respect to a given quality measure. We present 32 partition transformation that are used as mutation operators. The algorithm is a \((1+1)\) evolutionary strategy that selects a random mutation on each step from a subset of preselected mutation operators. Such selection is performed with a classifier trained to predict usefulness of each mutation for a given dataset. Comparison with state-of-the-art approach for automated clustering algorithm and hyperparameter selection shows the superiority of the proposed algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This phenomenon is most likely related to properties of a specific CVI and can be further mitigated, e.g. by applying different initialization method or using a more complex mutation/evolutionary scheme.
- 2.
Full collection of comparison boxplots can be found at https://bit.ly/2Zr3WwG.
References
Ma, P.C., Chan, K.C., Yao, X., Chiu, D.K.: An evolutionary clustering algorithm for gene expression microarray data analysis. Trans. Evol. Comp. 10, 296–314 (2006)
Punj, G., Stewart, D.W.: Cluster analysis in marketing research: review and suggestions for application. J. Mark. Res. 20(2), 134–148 (1983)
Farseev, A., Samborskii, I., Filchenkov, A., Chua, T.-S.: Cross-domain recommendation via clustering on multi-layer graphs. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 195–204. ACM (2017)
Kleinberg, J.: An impossibility theorem for clustering. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, NIPS 2002, pp. 463–470. MIT Press, Cambridge (2002)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 3(1), 1–27 (1974)
Moulavi, D., Jaskowiak, P.A., Campello, R.J.G.B., Zimek, A., Sander, J.: Density-based clustering validation, April 2014
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., PéRez, J.M., Perona, I.N.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46, 243–256 (2013)
Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., De Carvalho, A.C.P.L.F.: A survey of evolutionary algorithms for clustering. Trans. Syst. Man Cyber. Part C 39, 133–155 (2009)
Ferrari, D.G., de Castro, L.N.: Clustering algorithm selection by meta-learning systems. Inf. Sci. 301, 181–194 (2015)
Muravyov, S., Filchenkov, S.: Meta-learning system for automated clustering. In: AutoML@ PKDD/ECML, pp. 99–101 (2017)
Shalamov, V., Filchenkov, A., Shalyto, A.: Heuristic and metaheuristic solutions of pickup and delivery problem for self-driving taxi routing. Evol. Syst. 10, 11 (2017)
Cole, R.: Clustering with genetic algorithms. Ph.D. thesis (1998)
Hruschka, E.R., Ebecken, N.F.F.: A genetic algorithm for cluster analysis. Intell. Data Anal. 7, 15–25 (2003)
Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. Trans. Sys. Man Cyber. Part B 28, 301–315 (1998)
Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. Trans. Evol. Comp 11, 56–76 (2007)
Muravyov, S., Antipov, D., Buzdalova, A., Filchenkov, A.: Efficient computation of fitness function for evolutionary clustering. MENDEL 25, 87–94 (2019)
Pillay, N., Qu, R.: Hyper-Heuristics: Theory and Applications. Springer, Switzerland (2018). https://doi.org/10.1007/978-3-319-96514-7
Woodward, J.R., Swan, J.: The automatic generation of mutation operators for genetic algorithms. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 67–74. ACM (2012)
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: Openml: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013)
Shalamov, V., Efimova, V., Muravyov, S., Filchenkov, A.: Reinforcement-based method for simultaneous clustering algorithm selection and its hyperparameters optimization. Procedia Comput. Sci. 136, 144–153 (2018)
Hutter, F., Hoos, H., Leyton-Brown, H.: An evaluation of sequential model-based optimization for expensive blackbox functions. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 1209–1216. ACM (2013)
Acknowledgments
The authors would like to thank Maxim Buzdalov for useful comments. The research was financially supported by The Russian Science Foundation, Agreement 17-71-30029.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tomp, D., Muravyov, S., Filchenkov, A., Parfenov, V. (2019). Meta-learning Based Evolutionary Clustering Algorithm. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A., Menezes, R., Allmendinger, R. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2019. IDEAL 2019. Lecture Notes in Computer Science(), vol 11871. Springer, Cham. https://doi.org/10.1007/978-3-030-33607-3_54
Download citation
DOI: https://doi.org/10.1007/978-3-030-33607-3_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33606-6
Online ISBN: 978-3-030-33607-3
eBook Packages: Computer ScienceComputer Science (R0)