Skip to main content
Log in

Reusable components for partitioning clustering algorithms

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Clustering algorithms are well-established and widely used for solving data-mining tasks. Every clustering algorithm is composed of several solutions for specific sub-problems in the clustering process. These solutions are linked together in a clustering algorithm, and they define the process and the structure of the algorithm. Frequently, many of these solutions occur in more than one clustering algorithm. Mostly, new clustering algorithms include frequently occurring solutions to typical sub-problems from clustering, as well as from other machine-learning algorithms. The problem is that these solutions are usually integrated in their algorithms, and that original algorithms are not designed to share solutions to sub-problems outside the original algorithm easily. We propose a way of designing cluster algorithms and to improve existing ones, based on reusable components. Reusable components are well-documented, frequently occurring solutions to specific sub-problems in a specific area. Thus we identify reusable components, first, as solutions to characteristic sub-problems in partitioning cluster algorithms, and, further, identify a generic structure for the design of partitioning cluster algorithms. We analyze some partitioning algorithms (K-means, X-means, MPCK-means, and Kohonen SOM), and identify reusable components in them. We give examples of how new cluster algorithms can be designed based on them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adams M, Coplien J, Gamoke R, Hammer R, Keeve F, Nicodemus K (1998) Fault-tolerant telecommunication system patterns. In: Rising L (eds) The pattern handbook: techniques, strategies, and applications. Cambridge University Press, New York, pp 189–202

    Google Scholar 

  • Alexander C (1979) The timeless way of building. Oxford University Press, New York

    Google Scholar 

  • Alexander C (2005a) The nature of order book 1: the phenomenon of life. The Center for Environmental Structure, Berkeley, CA

    Google Scholar 

  • Alexander C. (2005b) The nature of order book 2: the process of creating life. The Center for Environmental Structure, Berkeley, CA

    Google Scholar 

  • Alexander C. (2005c) The nature of order book 3: a vision of a living world. The Center for Environmental Structure, Berkeley, CA

    Google Scholar 

  • Alexander C. (2005d) The nature of order book 4: the luminous ground. The Center for Environmental Structure, Berkeley, CA

    Google Scholar 

  • Arthur D, Vassilvitskii S (2007) K-Means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, Society for Industrial and Applied Mathematics, New Orleans, Louisiana, pp 1027–1035

  • Barbara D, Couto J, Li Y (2001) COOLCAT: An entropy-based algorithm for categorical clustering. In: Proceedings of the eleventh international conference on information and knowledge management, pp 582–589

  • Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of ACM SIGKDD, Seattle, WA, pp 59–68

  • Bennett KP, Bradley PS, Demiriz A (2000) Constrained k-means clustering. Microsoft Research. Available via DIALOG. ftp://ftp.research.microsoft.com/pub/tr/tr-2000-65.ps Accessed 9 Apr 2009

  • Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin-Heidelberg, pp 25–71

    Chapter  Google Scholar 

  • Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on machine learning, Banff, Canada, pp 81–88

  • Bradley PS, Fayyad UM (1998) Refining initial points for k-means clustering. In: Proceedings of the fifteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 91–99

  • Cheung YM (2003) k*-Means: a new generalized k-means clustering algorithm. Pattern Recog Lett 24: 2883–2893

    Article  MATH  Google Scholar 

  • Coplien JO, Harrison NB (2005) Organizational patterns of agile software development. Prentice-Hall PTR, Upper Saddle River, NJ

    Google Scholar 

  • Coplien JO, Schmidt DC (1995) Pattern languages of program design. Addison-Wesley Professional, Reading, MA

    Google Scholar 

  • Delibasic B, Kirchner K, Ruhland J et al (2008) A pattern-based data mining approach. In: Preisach C, Burckhardt H, Schmidt-Thieme L (eds) Data analysis, machine learning and applications. Springer, Berlin/Heidelberg, pp 327–334

    Chapter  Google Scholar 

  • Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on machine learning, ACM, New York, NY, p 29

  • Drossos N, Papagelis A, Kalles D (2000) Decision tree toolkit: a component-based library of decision tree algorithms. In: Zighed DZ, Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin/Heidelberg, pp 121–150

    Google Scholar 

  • Freeman P (1983) Reusable software engineering: concepts and research directions. In: Workshop on reusability in programming, ITT Programming, Stratford, Connecticut, pp 2–16

  • Gamma E, Helm R, Johnson R, Vlissides JM (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley, Reading, MA

    Google Scholar 

  • Hammerly G, Elkan C (2003) Learning the k in k-means. In: Proceedings of the seventeenth annual conference on neural information processing systems, pp 281–288

  • Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco

    Google Scholar 

  • Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28: 100–108

    Article  MATH  Google Scholar 

  • Kohonen T (2001) Self-organizing maps. Springer, Berlin

    MATH  Google Scholar 

  • Lea D (1994) Design patterns for avionics control systems. Available via DIALOG. http://gee.cs.oswego.edu/dl/acs/acs.pdf. Accessed 9 Apr 2009

  • Likas A, Vlassis N, Verbeek JJ (2002) The global k-means clustering algorithm. Pattern Recog 36: 451–461

    Google Scholar 

  • Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2006) YALE: rapid prototyping for complex data mining tasks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 935–940

  • Pelleg D, Moore A (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning, Morgan Kaufmann, San Francisco, pp 727–734

  • Siddique NH., Amavasai BP., Ikuta A.: (2007) Special issue on hybrid techniques in AI. Artif Intell Rev 27: 71–

    Article  Google Scholar 

  • Sommerville I (2004) Software engineering. Pearson, Boston

    Google Scholar 

  • Sonnenburg S, Braun ML, Ong CS, Bengio S, Bottou L, Holmes G, LeCun Y, Müller KR, Pereira F, Rasmussen CE, Rätsch G, Schölkopf B, Smola A, Vincent P, Weston J, Williamson RC (2007) The need for open source software in machine learning. J Mach Learn Resour 8: 2443–2466

    Google Scholar 

  • Steinley D (2006) K-means clustering: a half-century synthesis. British J Math Stat Psychol 59: 1–34

    Article  MathSciNet  Google Scholar 

  • Su MC, Liu TK, Chang HT (2002) Improving the self-organizing feature map algorithm using an efficient initialization scheme. Tamkang J Sci Eng 5: 35–48

    Google Scholar 

  • Tracz W (1990) Where does reuse start. ACM SIGSOFT Softw Eng Notes 15: 42–46

    Article  Google Scholar 

  • Winn T, Calder P (2002) Is this a pattern?. IEEE Softw 19((1): 59–66

    Article  Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning with application to clustering with side-information. Adv Neural Inf Syst 15: 521–528

    Google Scholar 

  • Zaki M, De N, Gao F, Palmerini P, Parimi N, Pathuri J, Phoophakdee B, Urban J (2005) Generic pattern mining via data mining template library. In: Boulicaut JF, De Raedt L, Mannila H (eds) Constraint-based mining and inductive databases. European workshop on inductive databases and constraint based mining. Springer, Berlin/Heidelberg, pp 362–379

    Google Scholar 

  • Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach (Advances in database systems). Springer, New York

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boris Delibašić.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Delibašić, B., Kirchner, K., Ruhland, J. et al. Reusable components for partitioning clustering algorithms. Artif Intell Rev 32, 59–75 (2009). https://doi.org/10.1007/s10462-009-9133-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-009-9133-6

Keywords

Navigation