Skip to main content

Bipartite Split-Merge Evolutionary Clustering

  • Conference paper
  • First Online:
Book cover Agents and Artificial Intelligence (ICAART 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11978))

Included in the following conference series:

Abstract

We propose a split-merge framework for evolutionary clustering. The proposed clustering technique, entitled Split-Merge Evolutionary Clustering is supposed to be more robust to concept drift scenarios by providing the flexibility to consider at each step a portion of the data and derive clusters from it to be used subsequently to update the existing clustering solution. The proposed framework is built around the idea to model two clustering solutions as a bipartite graph, which guides the update of the existing clustering solution by merging some clusters with ones from the newly constructed clustering while others are transformed by splitting their elements among several new clusters. We have evaluated and compared the discussed evolutionary clustering technique with two other state of the art algorithms: a bipartite correlation clustering (PivotBiCluster) and an incremental evolving clustering (Dynamic split-and-merge).

This work is part of the research project “Scalable resource efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Scikit-learn is a Python library for data mining and data analysis.

  2. 2.

    https://gitlab.com/machine_learning_vm/clustering_techniques.

References

  1. Ackerman, M., Dasgupta, S.: Incremental clustering: the case for extra clusters. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2014, pp. 307–315 (2014)

    Google Scholar 

  2. Ailon, N., Avigdor-Elgrabli, N., Liberty, E., van Zuylen, A.: Improved approximation algorithms for bipartite correlation clustering. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 25–36. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23719-5_3

    Chapter  MATH  Google Scholar 

  3. Angelov, P.: An approach for fuzzy rule-base adaptation using on-line clustering. Int. J. Approximate Reasoning 35, 275–289 (2004)

    Article  MathSciNet  Google Scholar 

  4. Awasthi, P., Balcan, M.F., Voevodski, K.: Local algorithms for interactive clustering. J. Mach. Learn. Res. 18(3), 1–35 (2017)

    MathSciNet  MATH  Google Scholar 

  5. Balcan, M.F., Blum, A., Vempala, S.: A discriminative framework for clustering via similarity functions. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC 2008, pp. 671–680 (2008)

    Google Scholar 

  6. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)

    Article  MathSciNet  Google Scholar 

  7. Bickel, S., Scheffer, T.: Multi-view clustering. In: Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 19–26 (2004)

    Google Scholar 

  8. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  9. Blackard, J.A., Dean, D.J., Anderson, C.W.: UCI machine learning repository (1998). http://archive.ics.uci.edu/ml

  10. Boeva, V., Angelova, M., Tsiporkova, E.: A split-merge evolutionary clustering algorithm. In: Proceedings of ICAART 2019, pp. 337–346 (2019)

    Google Scholar 

  11. Boeva, V., Tsiporkova, E., Kostadinova, E.: Analysis of multiple DNA microarray datasets. In: Kasabov, N. (ed.) Springer Handbook of Bio-/Neuroinformatics, pp. 223–234. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-30574-0_14

    Chapter  Google Scholar 

  12. Bouchachia, A.: Evolving clustering: an asset for evolving systems. IEEE SMC News Lett. 36, 1–6 (2011)

    Google Scholar 

  13. Bouchachia, A., Vanaret, C.: Incremental learning based on growing Gaussian mixture models. In: Proceedings of 10th International Conference on Machine Learning and Applications (ICMLA 2011), Honululu, Haweii (2011)

    Google Scholar 

  14. Câmpan, A., Şerban, G.: Adaptive clustering algorithms. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 407–418. Springer, Heidelberg (2006). https://doi.org/10.1007/11766247_35

    Chapter  Google Scholar 

  15. Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing, STOC 1997, pp. 626–635 (1997)

    Google Scholar 

  16. Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reisa, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 47(4), 547–553 (2009)

    Article  Google Scholar 

  17. Dell’Aglio, D., Valle, E.D., van Harmelen, F., Bernstein, A.: Stream reasoning: a survey and outlook. Data Sci. 1, 59–83 (2017)

    Article  Google Scholar 

  18. Dovzan, D., Skrjanc, I.: Recursive clustering based on a Gustafson-Kessel algorithm. Evolving Syst. 2, 15–24 (2011)

    Article  Google Scholar 

  19. Fa, R., Nandi, A.K.: Smart: Novel self splitting-merging clustering algorithm. In: European Signal Processing Conference, Bucharest, Romania, 27–32 August 2012. IEEE (2012)

    Google Scholar 

  20. Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. In: SIGKDD Explorations, London, vol. 2, pp. 51–57 (2000)

    Google Scholar 

  21. Gan, G., Ma, C., Wu, J.: Data clustering: Theory, Algorithms, and Applications. (Asa-Siam Series on Statistics and Applied Probability). Society for Industrial & Applied Mathematics, USA (2007)

    Google Scholar 

  22. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Disc. Data 1(1), 4 (2007)

    Article  Google Scholar 

  23. Goder, A., Filkov, V.: Consensus clustering algorithms: comparison and refinement. In: ALENEX, pp. 109–234 (2008)

    Chapter  Google Scholar 

  24. Golino, H.F., de Amaral, L.S.B., Duarte, S.F.P., et al.: Predicting increased blood pressure using machine learning. J. Obes. 2014, 12 (2014)

    Article  Google Scholar 

  25. Handl, J., Knowles, J., Kell, D.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)

    Article  Google Scholar 

  26. Jaccard, P.: The distribution of flora in the alpine zone. New Phytol. 11, 37–50 (1912)

    Article  Google Scholar 

  27. Jain, K.A., Dubes, C.R.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River (1988)

    MATH  Google Scholar 

  28. Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1999, pp. 16–22. ACM (1999)

    Google Scholar 

  29. Li, Y., Feng, X., Zhang, M., Zhou, M., Wang, N., Wangb, L.: Clustering of cardiovascular behavioral risk factors and blood pressure among people diagnosed with hypertension: a nationally representative survey in China. Sci. Rep. 6, 27627 (2016)

    Article  Google Scholar 

  30. Lughofer, E.: A dynamic split-and-merge approach for evolving cluster models. Evolving Syst. 3, 135–151 (2012)

    Article  Google Scholar 

  31. von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning. Proceedings of Machine Learning Research, vol. 27, pp. 65–79 (2012)

    Google Scholar 

  32. Nakai, K., Kanehisa, M.: Expert system for predicting protein localization sites in gram-negative bacteria. Proteins Struct. Funct. Genet. 11, 95–110 (1991)

    Article  Google Scholar 

  33. O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of IEEE International Conference on Data Engineering, pp. 685–694 (2001)

    Google Scholar 

  34. van Rijsbergen, C.: Information Retrieval. Butterworth-Heinemann Newton, Oxford (1979)

    MATH  Google Scholar 

  35. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  Google Scholar 

  36. Wang, M., Huang, V., Bosneag, A.M.C.: A novel Split-merge-evolve k clustering algorithm. In: IEEE 4th International Conference on Big Data Computing Service and Applications (BigDataService), Bamberg, Germany, 26–29 March 2018 (2018)

    Google Scholar 

  37. Xiang, Q., Mao, Q., Chai, K.M.A., Chieu, H.L., Tsang, I.W., Zhao, Z.: A split-merge framework for comparing clusterings. In: Proceedings of ICML 2012 (2012)

    Google Scholar 

  38. Zopf, M., et al.: Sequential clustering and contextual importance measures for incremental update summarization. In: Proceedings of COLING 2016, pp. 1071–1082 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milena Angelova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Boeva, V., Angelova, M., Devagiri, V.M., Tsiporkova, E. (2019). Bipartite Split-Merge Evolutionary Clustering. In: van den Herik, J., Rocha, A., Steels, L. (eds) Agents and Artificial Intelligence. ICAART 2019. Lecture Notes in Computer Science(), vol 11978. Springer, Cham. https://doi.org/10.1007/978-3-030-37494-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37494-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37493-8

  • Online ISBN: 978-3-030-37494-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics