skip to main content
10.1145/2797143.2797173acmotherconferencesArticle/Chapter ViewAbstractPublication PageseannConference Proceedingsconference-collections
research-article

A Density Based k-Means Initialization Scheme

Published: 25 September 2015 Publication History

Abstract

In this paper we present the results from some versions of a new initialization scheme for the k-Means algorithm. k-Means is probably the most fundamental clustering algorithm, with application in lots of fields, such as Signal Processing, Image Colour Segmentation as well as Web data management. The initialization process of the algorithm is of great interest, raising two big challenges. The first one, is to find out what is k, the number of clusters. The second one, is to determine which are the initial k seeds. We here mainly focus on the later. Our approach is heuristic hence profound mathematical arguments are not being presented. We are based mainly on criteria like density, Euclidean distance and the Mardia's multivariate kurtosis statistic. In order to test the quality of our results, a few cluster validity measures, other than the commonly used Sum Squared Error(SSE) are applied, which in our belief are suitable to be used for evaluation purposes.

References

[1]
{Arthur and Vassilvitskii (2007)} Arthur, D. and Vassilvitskii, S.: K-means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, Louisiana. (2007)
[2]
{Bradley and Fayyad (1998)} Bradley, P. and Fayyad, U.: Refining initial points in k-Means clustering. In Proceedings 15th International Conf. on Machine Learning, San Francisco, CA. (1998)
[3]
{Catherine and Gareth (2003)} Catherine A. Sugar and Gareth M. James: Finding the number of clusters in a data set: An information theoretic approach. In Journal of the American Statistical Association 98 (January): pp. 750--763. (2003)
[4]
{Davis and Bouldin (1979)} Davies, D. L. and Bouldin, D. W.: A Cluster Separation Measure. In IEEE Transactions on Pattern Analysis and Machine Intelligence (2). (1979)
[5]
{DeCarlo (1997)} DeCarlo T. Lawrence: On the meaning and use of kurtosis. In Psychological Methods, Vol. 2, No. 3., pp. 292--307. (1997)
[6]
{Gan, Ma and Wu (2007)} Gan, G., Ma, C. and Wu, J.: Data Clustering: Theory, Algorithms, and Applications. (2007)
[7]
{Halkidi and Vazirgiannis (2001)} Halkidi, M. and Vazirgiannis, M.: Clustering Validity Assessment: Finding the Optimal Partitioning of a Data Set. In First IEEE International Conference on Data Mining (ICDM'01). (2001)
[8]
{Honarkhah, Mehrdad and Caers (2010)} Honarkhah, Mehrdad and Caers, Jef: Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling. In 2010 Mathematical Geosciences 42, pp. 487--517. (2010)
[9]
{Katsavounidis, Kuo and Zhang (1994)} Katsavounidis, I., Kuo, C.-C.J. and Zhang, Z.: A new initialization technique for generalized Lloyd iteration. In Signals, Systems and Computers, 1994 Conference Record of the Twenty - Eighth Asilomar Conference on Volume:1. (1994)
[10]
{MacQueen (1967)} MacQueen, J. B.: Some methods for classification and analysis of multivariate observations. In Proc. 5th Berkeley Symp. Math. Stat. and prob. (1967)
[11]
{Mardia (1970)} Mardia K.V.: Measures of multivariate skewness and kurtosis with applications. In Biometrika (1970) vol. 57, pp. 519--530. (1970)
[12]
{Redmond and Heneghan (2007)} Redmond, S. J. and Heneghan C.: A method for initialising the k-Means clustering algorithm using kd-trees. In Pattern Recogn. Lett. 28. (2007)
[13]
{Rousseeuw (1987)} Rousseeuw, P. J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. In Journal of Computational and Applied Mathematics, Volume 20. pp. 53--65. (1987)
[14]
{Zhang, Ramakrishnan and Livny (1997)} Tian Zhang, Raghu Ramakrishnan, and Miron Livny: BIRCH: A New Data Clustering Algorithm and Its Applications. In Data Mining and Knowledge Discovery. pp. 141--182. (1997)
[15]
{Trujillo-Ortiz and Hernandez-Walls (2003)} Trujillo-Ortiz and Hernandez-Walls.: Mskekur: Mardia's multivariate skewness and kurtosis coefficients and its hypotheses testing. A MATLAB file. {WWW document}. URL: http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=3519. (2003)
[16]
{Datasets} Datasets downloaded from: http://cs.joensuu.fi/sipu/datasets/.
[17]
{Imagedatasets} From http://sipi.usc.edu/database/database.php?volume=misc.
[18]
{Chen and Chang (2005)} Hue-Ling Chen and Ye-In Chang: Neighbor-finding based on space-filling curves. In Information Systems 30. pp. 205--226. (2005)
[19]
{McCreight (1985)} McCreight E. M.: Priority search trees. In SIAM Journal on Computing, vol. 14, no. 2, pp. 257--276. (1985)
[20]
{Tropf and Herzog (1981)} Tropf H. and Herzog H.: Multidimensional Range Search in Dynamically Balanced Trees. In Angewandte Informatik (1981)
[21]
{PeÃśa, Lozano and LarraÃśaga (1999)} J. M. PeÃśa, J. A. Lozano, and P. LarraÃśaga.: An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recogn. Lett. 20, 10 (October 1999), 1027--1040. (1999)
[22]
{Celebi, Hassan and Vela} M. Emre Celebi and Hassan A. Kingravi and Patricio A. Vela: A comparative study of efficient initialization methods for the k-means clustering algorithm Expert Systems with Applications, v. 40, 200--210 (2013)

Cited By

View all
  • (2023)Fabrication-Aware Joint Clustering in Freeform Space-FramesBuildings10.3390/buildings1304096213:4(962)Online publication date: 4-Apr-2023
  • (2018)Dimensionally Distributed Density EstimationArtificial Intelligence and Soft Computing10.1007/978-3-319-91262-2_31(343-353)Online publication date: 11-May-2018
  • (2016)Two Step Clustering Model for K-Means AlgorithmProceedings of the Fifth International Conference on Network, Communication and Computing10.1145/3033288.3033347(213-217)Online publication date: 17-Dec-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)
September 2015
266 pages
ISBN:9781450335805
DOI:10.1145/2797143
© 2015 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

  • Aristotle University of Thessaloniki
  • INNS: International Neural Network Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Algorithms
  2. Clustering
  3. Index Structures
  4. Initialization
  5. Text Mining
  6. Web Information Retrieval
  7. k-Means

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

16th EANN workshops

Acceptance Rates

EANN '15 Paper Acceptance Rate 36 of 60 submissions, 60%;
Overall Acceptance Rate 36 of 60 submissions, 60%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Fabrication-Aware Joint Clustering in Freeform Space-FramesBuildings10.3390/buildings1304096213:4(962)Online publication date: 4-Apr-2023
  • (2018)Dimensionally Distributed Density EstimationArtificial Intelligence and Soft Computing10.1007/978-3-319-91262-2_31(343-353)Online publication date: 11-May-2018
  • (2016)Two Step Clustering Model for K-Means AlgorithmProceedings of the Fifth International Conference on Network, Communication and Computing10.1145/3033288.3033347(213-217)Online publication date: 17-Dec-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media