Reusable components for partitioning clustering algorithms

Delibašić, Boris; Kirchner, Kathrin; Ruhland, Johannes; Jovanović, Miloš; Vukićević, Milan

doi:10.1007/s10462-009-9133-6

Reusable components for partitioning clustering algorithms

Published: 20 October 2009

Volume 32, pages 59–75, (2009)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Boris Delibašić¹,
Kathrin Kirchner²,
Johannes Ruhland²,
Miloš Jovanović¹ &
…
Milan Vukićević¹

211 Accesses
16 Citations
Explore all metrics

Abstract

Clustering algorithms are well-established and widely used for solving data-mining tasks. Every clustering algorithm is composed of several solutions for specific sub-problems in the clustering process. These solutions are linked together in a clustering algorithm, and they define the process and the structure of the algorithm. Frequently, many of these solutions occur in more than one clustering algorithm. Mostly, new clustering algorithms include frequently occurring solutions to typical sub-problems from clustering, as well as from other machine-learning algorithms. The problem is that these solutions are usually integrated in their algorithms, and that original algorithms are not designed to share solutions to sub-problems outside the original algorithm easily. We propose a way of designing cluster algorithms and to improve existing ones, based on reusable components. Reusable components are well-documented, frequently occurring solutions to specific sub-problems in a specific area. Thus we identify reusable components, first, as solutions to characteristic sub-problems in partitioning cluster algorithms, and, further, identify a generic structure for the design of partitioning cluster algorithms. We analyze some partitioning algorithms (K-means, X-means, MPCK-means, and Kohonen SOM), and identify reusable components in them. We give examples of how new cluster algorithms can be designed based on them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Adams M, Coplien J, Gamoke R, Hammer R, Keeve F, Nicodemus K (1998) Fault-tolerant telecommunication system patterns. In: Rising L (eds) The pattern handbook: techniques, strategies, and applications. Cambridge University Press, New York, pp 189–202
Google Scholar
Alexander C (1979) The timeless way of building. Oxford University Press, New York
Google Scholar
Alexander C (2005a) The nature of order book 1: the phenomenon of life. The Center for Environmental Structure, Berkeley, CA
Google Scholar
Alexander C. (2005b) The nature of order book 2: the process of creating life. The Center for Environmental Structure, Berkeley, CA
Google Scholar
Alexander C. (2005c) The nature of order book 3: a vision of a living world. The Center for Environmental Structure, Berkeley, CA
Google Scholar
Alexander C. (2005d) The nature of order book 4: the luminous ground. The Center for Environmental Structure, Berkeley, CA
Google Scholar
Arthur D, Vassilvitskii S (2007) K-Means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, Society for Industrial and Applied Mathematics, New Orleans, Louisiana, pp 1027–1035
Barbara D, Couto J, Li Y (2001) COOLCAT: An entropy-based algorithm for categorical clustering. In: Proceedings of the eleventh international conference on information and knowledge management, pp 582–589
Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of ACM SIGKDD, Seattle, WA, pp 59–68
Bennett KP, Bradley PS, Demiriz A (2000) Constrained k-means clustering. Microsoft Research. Available via DIALOG. ftp://ftp.research.microsoft.com/pub/tr/tr-2000-65.ps Accessed 9 Apr 2009
Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin-Heidelberg, pp 25–71
Chapter Google Scholar
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on machine learning, Banff, Canada, pp 81–88
Bradley PS, Fayyad UM (1998) Refining initial points for k-means clustering. In: Proceedings of the fifteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 91–99
Cheung YM (2003) k*-Means: a new generalized k-means clustering algorithm. Pattern Recog Lett 24: 2883–2893
Article MATH Google Scholar
Coplien JO, Harrison NB (2005) Organizational patterns of agile software development. Prentice-Hall PTR, Upper Saddle River, NJ
Google Scholar
Coplien JO, Schmidt DC (1995) Pattern languages of program design. Addison-Wesley Professional, Reading, MA
Google Scholar
Delibasic B, Kirchner K, Ruhland J et al (2008) A pattern-based data mining approach. In: Preisach C, Burckhardt H, Schmidt-Thieme L (eds) Data analysis, machine learning and applications. Springer, Berlin/Heidelberg, pp 327–334
Chapter Google Scholar
Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on machine learning, ACM, New York, NY, p 29
Drossos N, Papagelis A, Kalles D (2000) Decision tree toolkit: a component-based library of decision tree algorithms. In: Zighed DZ, Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin/Heidelberg, pp 121–150
Google Scholar
Freeman P (1983) Reusable software engineering: concepts and research directions. In: Workshop on reusability in programming, ITT Programming, Stratford, Connecticut, pp 2–16
Gamma E, Helm R, Johnson R, Vlissides JM (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley, Reading, MA
Google Scholar
Hammerly G, Elkan C (2003) Learning the k in k-means. In: Proceedings of the seventeenth annual conference on neural information processing systems, pp 281–288
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco
Google Scholar
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28: 100–108
Article MATH Google Scholar
Kohonen T (2001) Self-organizing maps. Springer, Berlin
MATH Google Scholar
Lea D (1994) Design patterns for avionics control systems. Available via DIALOG. http://gee.cs.oswego.edu/dl/acs/acs.pdf. Accessed 9 Apr 2009
Likas A, Vlassis N, Verbeek JJ (2002) The global k-means clustering algorithm. Pattern Recog 36: 451–461
Google Scholar
Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2006) YALE: rapid prototyping for complex data mining tasks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 935–940
Pelleg D, Moore A (2000) X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the seventeenth international conference on machine learning, Morgan Kaufmann, San Francisco, pp 727–734
Siddique NH., Amavasai BP., Ikuta A.: (2007) Special issue on hybrid techniques in AI. Artif Intell Rev 27: 71–
Article Google Scholar
Sommerville I (2004) Software engineering. Pearson, Boston
Google Scholar
Sonnenburg S, Braun ML, Ong CS, Bengio S, Bottou L, Holmes G, LeCun Y, Müller KR, Pereira F, Rasmussen CE, Rätsch G, Schölkopf B, Smola A, Vincent P, Weston J, Williamson RC (2007) The need for open source software in machine learning. J Mach Learn Resour 8: 2443–2466
Google Scholar
Steinley D (2006) K-means clustering: a half-century synthesis. British J Math Stat Psychol 59: 1–34
Article MathSciNet Google Scholar
Su MC, Liu TK, Chang HT (2002) Improving the self-organizing feature map algorithm using an efficient initialization scheme. Tamkang J Sci Eng 5: 35–48
Google Scholar
Tracz W (1990) Where does reuse start. ACM SIGSOFT Softw Eng Notes 15: 42–46
Article Google Scholar
Winn T, Calder P (2002) Is this a pattern?. IEEE Softw 19((1): 59–66
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning with application to clustering with side-information. Adv Neural Inf Syst 15: 521–528
Google Scholar
Zaki M, De N, Gao F, Palmerini P, Parimi N, Pathuri J, Phoophakdee B, Urban J (2005) Generic pattern mining via data mining template library. In: Boulicaut JF, De Raedt L, Mannila H (eds) Constraint-based mining and inductive databases. European workshop on inductive databases and constraint based mining. Springer, Berlin/Heidelberg, pp 362–379
Google Scholar
Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach (Advances in database systems). Springer, New York
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Organizational Sciences, University of Belgrade, Jove Ilića 154, Belgrade, Serbia
Boris Delibašić, Miloš Jovanović & Milan Vukićević
Faculty of Economics and Business Administration, Friedrich Schiller University of Jena, Carl-Zeiß Straße 3, Jena, Germany
Kathrin Kirchner & Johannes Ruhland

Authors

Boris Delibašić
View author publications
You can also search for this author in PubMed Google Scholar
Kathrin Kirchner
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Ruhland
View author publications
You can also search for this author in PubMed Google Scholar
Miloš Jovanović
View author publications
You can also search for this author in PubMed Google Scholar
Milan Vukićević
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boris Delibašić.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Delibašić, B., Kirchner, K., Ruhland, J. et al. Reusable components for partitioning clustering algorithms. Artif Intell Rev 32, 59–75 (2009). https://doi.org/10.1007/s10462-009-9133-6

Download citation

Published: 20 October 2009
Issue Date: December 2009
DOI: https://doi.org/10.1007/s10462-009-9133-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Reusable components for partitioning clustering algorithms

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reusable components for partitioning clustering algorithms

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation