Abstract
We propose a split-merge framework for evolutionary clustering. The proposed clustering technique, entitled Split-Merge Evolutionary Clustering is supposed to be more robust to concept drift scenarios by providing the flexibility to consider at each step a portion of the data and derive clusters from it to be used subsequently to update the existing clustering solution. The proposed framework is built around the idea to model two clustering solutions as a bipartite graph, which guides the update of the existing clustering solution by merging some clusters with ones from the newly constructed clustering while others are transformed by splitting their elements among several new clusters. We have evaluated and compared the discussed evolutionary clustering technique with two other state of the art algorithms: a bipartite correlation clustering (PivotBiCluster) and an incremental evolving clustering (Dynamic split-and-merge).
This work is part of the research project “Scalable resource efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Scikit-learn is a Python library for data mining and data analysis.
- 2.
References
Ackerman, M., Dasgupta, S.: Incremental clustering: the case for extra clusters. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2014, pp. 307–315 (2014)
Ailon, N., Avigdor-Elgrabli, N., Liberty, E., van Zuylen, A.: Improved approximation algorithms for bipartite correlation clustering. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 25–36. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23719-5_3
Angelov, P.: An approach for fuzzy rule-base adaptation using on-line clustering. Int. J. Approximate Reasoning 35, 275–289 (2004)
Awasthi, P., Balcan, M.F., Voevodski, K.: Local algorithms for interactive clustering. J. Mach. Learn. Res. 18(3), 1–35 (2017)
Balcan, M.F., Blum, A., Vempala, S.: A discriminative framework for clustering via similarity functions. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC 2008, pp. 671–680 (2008)
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)
Bickel, S., Scheffer, T.: Multi-view clustering. In: Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 19–26 (2004)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Blackard, J.A., Dean, D.J., Anderson, C.W.: UCI machine learning repository (1998). http://archive.ics.uci.edu/ml
Boeva, V., Angelova, M., Tsiporkova, E.: A split-merge evolutionary clustering algorithm. In: Proceedings of ICAART 2019, pp. 337–346 (2019)
Boeva, V., Tsiporkova, E., Kostadinova, E.: Analysis of multiple DNA microarray datasets. In: Kasabov, N. (ed.) Springer Handbook of Bio-/Neuroinformatics, pp. 223–234. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-30574-0_14
Bouchachia, A.: Evolving clustering: an asset for evolving systems. IEEE SMC News Lett. 36, 1–6 (2011)
Bouchachia, A., Vanaret, C.: Incremental learning based on growing Gaussian mixture models. In: Proceedings of 10th International Conference on Machine Learning and Applications (ICMLA 2011), Honululu, Haweii (2011)
Câmpan, A., Şerban, G.: Adaptive clustering algorithms. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 407–418. Springer, Heidelberg (2006). https://doi.org/10.1007/11766247_35
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing, STOC 1997, pp. 626–635 (1997)
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reisa, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 47(4), 547–553 (2009)
Dell’Aglio, D., Valle, E.D., van Harmelen, F., Bernstein, A.: Stream reasoning: a survey and outlook. Data Sci. 1, 59–83 (2017)
Dovzan, D., Skrjanc, I.: Recursive clustering based on a Gustafson-Kessel algorithm. Evolving Syst. 2, 15–24 (2011)
Fa, R., Nandi, A.K.: Smart: Novel self splitting-merging clustering algorithm. In: European Signal Processing Conference, Bucharest, Romania, 27–32 August 2012. IEEE (2012)
Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. In: SIGKDD Explorations, London, vol. 2, pp. 51–57 (2000)
Gan, G., Ma, C., Wu, J.: Data clustering: Theory, Algorithms, and Applications. (Asa-Siam Series on Statistics and Applied Probability). Society for Industrial & Applied Mathematics, USA (2007)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Disc. Data 1(1), 4 (2007)
Goder, A., Filkov, V.: Consensus clustering algorithms: comparison and refinement. In: ALENEX, pp. 109–234 (2008)
Golino, H.F., de Amaral, L.S.B., Duarte, S.F.P., et al.: Predicting increased blood pressure using machine learning. J. Obes. 2014, 12 (2014)
Handl, J., Knowles, J., Kell, D.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)
Jaccard, P.: The distribution of flora in the alpine zone. New Phytol. 11, 37–50 (1912)
Jain, K.A., Dubes, C.R.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River (1988)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1999, pp. 16–22. ACM (1999)
Li, Y., Feng, X., Zhang, M., Zhou, M., Wang, N., Wangb, L.: Clustering of cardiovascular behavioral risk factors and blood pressure among people diagnosed with hypertension: a nationally representative survey in China. Sci. Rep. 6, 27627 (2016)
Lughofer, E.: A dynamic split-and-merge approach for evolving cluster models. Evolving Syst. 3, 135–151 (2012)
von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning. Proceedings of Machine Learning Research, vol. 27, pp. 65–79 (2012)
Nakai, K., Kanehisa, M.: Expert system for predicting protein localization sites in gram-negative bacteria. Proteins Struct. Funct. Genet. 11, 95–110 (1991)
O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of IEEE International Conference on Data Engineering, pp. 685–694 (2001)
van Rijsbergen, C.: Information Retrieval. Butterworth-Heinemann Newton, Oxford (1979)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Wang, M., Huang, V., Bosneag, A.M.C.: A novel Split-merge-evolve k clustering algorithm. In: IEEE 4th International Conference on Big Data Computing Service and Applications (BigDataService), Bamberg, Germany, 26–29 March 2018 (2018)
Xiang, Q., Mao, Q., Chai, K.M.A., Chieu, H.L., Tsang, I.W., Zhao, Z.: A split-merge framework for comparing clusterings. In: Proceedings of ICML 2012 (2012)
Zopf, M., et al.: Sequential clustering and contextual importance measures for incremental update summarization. In: Proceedings of COLING 2016, pp. 1071–1082 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Boeva, V., Angelova, M., Devagiri, V.M., Tsiporkova, E. (2019). Bipartite Split-Merge Evolutionary Clustering. In: van den Herik, J., Rocha, A., Steels, L. (eds) Agents and Artificial Intelligence. ICAART 2019. Lecture Notes in Computer Science(), vol 11978. Springer, Cham. https://doi.org/10.1007/978-3-030-37494-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-37494-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37493-8
Online ISBN: 978-3-030-37494-5
eBook Packages: Computer ScienceComputer Science (R0)