Skip to main content

Size-Constrained Clustering Using an Initial Points Selection Method

  • Conference paper
Knowledge Science, Engineering and Management (KSEM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8041))

  • 2351 Accesses

Abstract

Size-Constrained clustering tries to solve the problem that how to classify dataset into groups based on each document’s similarity with additional requirement which each group size is within a fixed range. By far, adding constraints to assignment step in K-Means clustering is a main approach. But the performance of the algorithm also depends highly on the initial cluster centers like standard K-Means. We propose an initial points selection method by recursively discovering the point with large density around it. Root Mean Square Error and convergence speed (iteration times) are the two most important evaluation standards for clustering using an iterative procedure. Our experiments are conducted on about ten thousand research proposals of National Natural Science Foundation of China and the results show that our method can reduce the iteration times by over 50% and get smaller Root Mean Square Error. The method is scalable and can be coupled with a scalable size-constrained clustering algorithm to address the large-scale clustering problem in data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Dubes, R.C., Jain, A.K.: Algorithms for Clustering Data. Prentice Hall (1988)

    Google Scholar 

  2. Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained K-Means Clustering. MSR-RT-2000-65, Microsoft Research (2000)

    Google Scholar 

  3. Bertsekas, D.P.: Linear Network Optimization. MIT Press, Cambridge (1991)

    MATH  Google Scholar 

  4. Kelly, D.J., O’Neill, G.M.: The Minimum Cost Flow Problem and The Network Simplex Solution Method (September 1991)

    Google Scholar 

  5. Zhao, J.: Optimal Clustering: Genetic Constrained K-Means and Linear Programming Algorithms (2006)

    Google Scholar 

  6. Zhu, S., Wang, D., Li, T.: Data clustering with size constraints. Knowledge-Based Systems 23(8), 883–889 (2010)

    Article  MathSciNet  Google Scholar 

  7. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley and Sons, NY (1973)

    MATH  Google Scholar 

  8. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  9. Bradley, P.S., Fayyad, U.M.: Refing initial points for K-Means clustering. In: 15th Internat. Conf. on Machine Learning (1998)

    Google Scholar 

  10. Deelers, S., Auwatanamongkol, S.: Enhancing K-Means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance. Internat. J. Comput. Sci. 2, 247–252 (2007)

    Google Scholar 

  11. Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-Means clustering. Pattern Recognition Letters 25(11), 1293–1302 (2004)

    Article  Google Scholar 

  12. Steinley, D., Brusco, J.M.: Initialization K-Means Batch Clustering: A Critical Evaluation of Several Techniques. Journal of Classification 24(1), 99–121 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  13. Lozano, J.A., Pena, J.M., Larranaga, P.: An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recognition Lett. 20, 1027–1040 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lei, K., Wang, S., Song, W., Li, Q. (2013). Size-Constrained Clustering Using an Initial Points Selection Method. In: Wang, M. (eds) Knowledge Science, Engineering and Management. KSEM 2013. Lecture Notes in Computer Science(), vol 8041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39787-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39787-5_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39786-8

  • Online ISBN: 978-3-642-39787-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics