Two density-based k-means initialization algorithms for non-metric data clustering

Bianchi, Filippo Maria; Livi, Lorenzo; Rizzi, Antonello

doi:10.1007/s10044-014-0440-4

Two density-based k-means initialization algorithms for non-metric data clustering

Theoretical Advances
Published: 15 January 2015

Volume 19, pages 745–763, (2016)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Filippo Maria Bianchi¹,
Lorenzo Livi¹ &
Antonello Rizzi¹

543 Accesses
19 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, we propose a density-based clusters’ representatives selection algorithm that identifies the most central patterns from the dense regions in the dataset. The method, which has been implemented using two different strategies, is applicable to input spaces with no trivial geometry. Our approach exploits a probability density function built through the Parzen estimator, which relies on a (not necessarily metric) dissimilarity measure. Being a representatives extractor a general-purpose algorithm, our method is obviously applicable in different contexts. However, to test the proposed procedure, we specifically consider the problem of initializing the k-means algorithm. We face problems defined on standard real-valued vectors, labeled graphs, and finally sequences of real-valued vectors and sequences of characters. The obtained results demonstrate the effectiveness of the proposed representative selection method with respect to other state-of-the-art solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Density-Penalized Distance Measure for Clustering

Density Based Clustering: Alternatives to DBSCAN

A New Density Clustering Method Using Mutual Nearest Neighbor

Notes

http://libspare.org/

References

Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms., SODA ’07Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1027–1035
Bache K, Lichman M (2013) UCI Machine learning repository. http://archive.ics.uci.edu/ml
Bardaji I, Ferrer M, Sanfeliu A (2010) A comparison between two representatives of a set of graphs: median vs. barycenter graph. In: Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition, SSPR&SPR’10. Springer, Berlin, pp 149–158
Bianchi FM, Livi L, Rizzi A, Sadeghian A (2014) A Granular Computing approach to the design of optimized graph classification systems. Soft Comput 18(2):393–412. doi:10.1007/s00500-013-1065-z
Article Google Scholar
Bulò SR, Pelillo M (2013) A game-theoretic approach to hypergraph clustering. IEEE Trans Pattern Anal Machine Intell 35(6):1312–1327
Article Google Scholar
Cilibrasi R, Vitányi PMB (2005) Clustering by compression. IEEE Trans Inf Theory 51(4):1523–1545
Article MathSciNet MATH Google Scholar
Del Vescovo G, Livi L, Frattale Mascioli FM, Rizzi A (2014) On the problem of modeling structured data with the MinSOD representative. Int J Comput Theory Eng 6(1):9–14. doi:10.7763/IJCTE.2014.V6.827
Duin RPW, Fred ALN, Loog M, Pękalska E (2012) Mode seeking clustering by KNN and mean shift evaluated. In: Gimel’farb G, Hancock E, Imiya A, Kuijper A, Kudo M, Omachi S, Windeatt T, Yamada K (eds) Structural, syntactic, and statistical pattern recognition, LNCS, vol. 7626. Springer, Berlin, pp 51–59. doi:10.1007/978-3-642-34166-3\_6
Duin RPW, Pękalska E (2010) Non-Euclidean dissimilarities: causes and informativeness. In: Proceedings of the 2010 joint IAPR international conference on structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 324–333
Duin RPW, Pękalska E, Harol A, Lee WJ, Bunke H (2008) On Euclidean corrections for non-Euclidean dissimilarities. In: Vitoria Lobo N, Kasparis T, Roli F, Kwok J, Georgiopoulos M, Anagnostopoulos G, Loog M (eds) Structural, syntactic, and statistical pattern recognition, vol. 5342, LNCS. Springer, Berlin, pp 551–561. doi:10.1007/978-3-540-89689-0\_59
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Google Scholar
Filippone M (2009) Dealing with non-metric dissimilarities in Fuzzy central clustering algorithms. Int J Approx Reas 50(2):363–384. doi:10.1016/j.ijar.2008.08.006
Article MATH Google Scholar
Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognit 41(1):176–190
Article MATH Google Scholar
Mascioli Frattale FM, Rizzi A, Panella M, Martinelli G (2000) Scale-based approach to hierarchical fuzzy clustering. Signal Process 80(6):1001–1016. doi:10.1016/S0165-1684(00)00016-5
Hinneburg A, Gabriel HH (2007) Denclue 2.0: fast clustering based on kernel density estimation. In: Advances in intelligent data analysis VII. Springer, Berlin, pp 70–80
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666. doi:10.1016/j.patrec.2009.09.011
Article Google Scholar
Jiang X, Müunger A, Bunke H (2001) On median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 23:1144–1151. doi:10.1109/34.954604
Article Google Scholar
Kannan R, Vempala S, Vetta A (2004) On clusterings: Good, bad, and spectral. J ACM (JACM) 51:497–515
Article MathSciNet MATH Google Scholar
Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Dis Data 3(1):1:1–1:58. doi:10.1145/1497577.1497578
Livi L, Bianchi FM, Rizzi A, Sadeghian A (2013) Dissimilarity space embedding of labeled graphs by a clustering-based compression procedure. In: Proceedings of the 2013 international joint conference on neural networks, pp 1646–1653. doi:10.1109/IJCNN.2013.6706937
Livi L, Del Vescovo G, Rizzi A (2012) Graph Recognition by Seriation and Frequent Substructures Mining. Proc First Int Conf Pattern Recognit Appl Methods 1:186–191. doi:10.5220/0003733201860191
Google Scholar
Livi L, Del Vescovo G, Rizzi A (2013) Combining graph seriation and substructures mining for graph recognition. In: Latorre Carmona P, Sánchez JS, Fred ALN (eds) Pattern recognition—applications and methods. Advances in intelligent and soft computing, vol 204. Springer, Berlin, pp 79–91. doi:10.1007/978-3-642-36530-0\_7
Livi L, Rizzi A (2013) Graph ambiguity. Fuzzy Sets Syst 221:24–47. doi:10.1016/j.fss.2013.01.001
Article MathSciNet MATH Google Scholar
Livi L, Rizzi A (2013) The graph matching problem. Pattern Anal Appl 16(3):253–283. doi:10.1007/s10044-012-0284-8
Article MathSciNet MATH Google Scholar
Livi L, Del Vescovo G, Rizzi A, Frattale Mascioli FM (2014) Building pattern recognition applications with the SPARE library. arXiv:1410.5263
Livi L, Rizzi A, Sadeghian A (2014) Optimized dissimilarity space embedding for labeled graphs. Inf Sci 266:47–64. doi:10.1016/j.ins.2014.01.005
Article MathSciNet Google Scholar
Livi L, Tahayori H, Sadeghian A, Rizzi A (2013) Aggregating \(\alpha\)-planes for Type-2 fuzzy set matching. In: 2013 Joint IFSA World Congress and NAFIPS annual meeting (IFSA/NAFIPS), pp 860–865 (2013). doi:10.1109/IFSA-NAFIPS.2013.6608513
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137. doi:10.1109/TIT.1982.1056489
Article MathSciNet MATH Google Scholar
Lozano MA, Escolano F (2013) Graph matching and clustering using kernel attributes. Neurocomputing 113:177–194. doi:10.1016/j.neucom.2013.01.015
Article Google Scholar
Ostrovsky R, Rabani Y, Schulman L, Swamy C (2006) The effectiveness of Lloyd-type methods for the k-means problem. In: FOCS ’06. 47th annual IEEE symposium on foundations of computer science, pp 165–176. doi:10.1109/FOCS.2006.75
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33(3):1065–1076
Article MathSciNet MATH Google Scholar
Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications. In: Series in machine perception and artificial intelligence. World Scientific, Singapore
Pekalska E, Harol A, Duin RPW, Spillmann B, Bunke H (2006) Non-Euclidean or non-metric measures can be informative. In: Yeung DY, Kwok J, Fred ALN, Roli F, Ridder D (eds) Structural, syntactic, and statistical pattern recognition, LNCS, vol 4109. Springer, Berlin, pp 871–880. doi:10.1007/11815921\_96
Riesen K, Bunke H (2008) IAM graph database repository for graph based pattern recognition and machine learning. In: Proceedings of the 2008 joint IAPR international workshop on structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 287–297. doi:10.1007/978-3-540-89689-0\_33
Rizzi A, Del Vescovo G, Livi L, Frattale Mascioli FM (2012) A new granular computing approach for sequences representation and classification. In: Proceedings of the 2012 international joint conference on neural networks, pp 2268–2275. doi:10.1109/IJCNN.2012.6252680
Rizzi A, Livi L, Tahayori H, Sadeghian A (2013) Matching general type-2 fuzzy sets by comparing the vertical slices. In: 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), pp 866–871. doi:10.1109/IFSA-NAFIPS.2013.6608514
Roth V, Laub J, Kawanabe M, Buhmann J (2003) Optimal cluster preserving embedding of nonmetric proximity data. IEEE Trans Pattern Anal Mach Intell 25(12):1540–1551. doi:10.1109/TPAMI.2003.1251147
Article Google Scholar
Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. doi:10.1016/0377-0427(87)90125-7
Article MATH Google Scholar
Theodoridis S, Koutroumbas K (2008) Pattern recognition, 4th edn. Elsevier/Academic Press, Amsterdam
Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12):1866–1881. doi:10.1109/TPAMI.2005.237
Article Google Scholar
Vendramin L, Campello RJGB, Hruschka ER (2010) Relative clustering validity criteria: a comparative overview. Stat Anal Data Min 3(4):209–235. doi:10.1002/sam.v3:4
MathSciNet Google Scholar
Yager RR, Filev DP (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24(8):1279–1284
Article Google Scholar
Yu XG, Jian Y (2005) A new clustering algorithm based on knn and denclue. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 4. IEEE, New York, pp 2033–2038

Download references

Author information

Authors and Affiliations

Department of Information Engineering, Electronics, and Telecommunications, SAPIENZA University of Rome, Via Eudossiana 18, 00184, Rome, Italy
Filippo Maria Bianchi, Lorenzo Livi & Antonello Rizzi

Authors

Filippo Maria Bianchi
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Livi
View author publications
You can also search for this author in PubMed Google Scholar
Antonello Rizzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lorenzo Livi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bianchi, F.M., Livi, L. & Rizzi, A. Two density-based k-means initialization algorithms for non-metric data clustering. Pattern Anal Applic 19, 745–763 (2016). https://doi.org/10.1007/s10044-014-0440-4

Download citation

Received: 13 August 2013
Accepted: 21 December 2014
Published: 15 January 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10044-014-0440-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Two density-based k-means initialization algorithms for non-metric data clustering

Abstract

Access this article

Similar content being viewed by others

A Density-Penalized Distance Measure for Clustering

Density Based Clustering: Alternatives to DBSCAN

A New Density Clustering Method Using Mutual Nearest Neighbor

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two density-based k-means initialization algorithms for non-metric data clustering

Abstract

Access this article

Similar content being viewed by others

A Density-Penalized Distance Measure for Clustering

Density Based Clustering: Alternatives to DBSCAN

A New Density Clustering Method Using Mutual Nearest Neighbor

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation