Skip to main content

Estimating the Number of Clusters as a Pre-processing Step to Unsupervised Learning

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2016)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 557))

  • 1752 Accesses

Abstract

A great challenge in machine learning, as far as unsupervised algorithms are concerned, is to devise methods for pre-estimating the number of clusters associated to a given set of patterns to be clustered. By doing so and by using the number of clusters as input to clustering algorithms that require the information, the chances of getting better results increase substantially. The work described in this paper investigates the performance of an algorithm, based on the sequential clustering BSAS (Basic Sequential Algorithmic Scheme), to produce an ordered list (by frequency of occurrences), containing good estimates for the number of clusters in a given set of patterns. The BSAS is a convenient choice since the order in which patterns are presented to the algorithm can impact the induced clustering. The results of the experiments in eight sets of patterns can be considered empirical evidence that the procedure can be a practical and reliable option, as a pre-processing step, to using clustering algorithms that require the number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Asano, T., Bhattacharya, B., Keil, M., Yao, F.: Clustering algorithms based on minimum and maximum spanning trees. In: Proceedings. of the Fourth Annual Symposium on Computational Geometry (SCG 1988), pp. 252–257 (1988)

    Google Scholar 

  2. Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Inf. Process. Lett. 76, 175–181 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  3. Päivinen, N.: Clustering with minimum spanning tree of scale-free structure. Pattern Recogn. 26, 921–930 (2005)

    Article  Google Scholar 

  4. Luxburg, U.: A tutorial on spectral clustering. J. Stat. Comput. 17, 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  5. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  6. Theodorides, S., Kotroumbas, K.: Pattern Recognition, 4th edn. Elsevier, USA (2009)

    Google Scholar 

  7. Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidleberg (2006)

    Chapter  Google Scholar 

  8. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 645–678 (2005)

    Article  Google Scholar 

  9. Nicoletti, M.C., Real E.M., Oliveira, O.L.: The impact of refinement strategies on sequential clustering algorithms. In: Proceedings of the 13th International Conference on Intelligent Systems Design and Applications (ISDA 2013), pp. 47–52 (2013)

    Google Scholar 

  10. Real, E.M., Nicoletti, M.C., Oliveira, O.L.: A closer look into sequential clustering algorithms and associated post-processing refinement strategies. Int. J. Innov. Comput. Appl. 6, 1–12 (2014)

    Article  Google Scholar 

  11. Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20, 68–86 (1971)

    Article  MATH  Google Scholar 

  12. Wertheimer, M.: Principles of perceptual organization. In: Beardsley, D., Wertheimer, M. (eds.) Readings in Perception. Van Nostrand, Princeton (1958)

    Google Scholar 

  13. Liu, Y., Li, Z., Xiong, H., Gao X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of the 10th International IEEE Conference on Data Mining (ICMD), pp. 911–916 (2010)

    Google Scholar 

  14. Bandyopadhyay, S., Saha, S.: Unsupervised Classification. Springer, Heidelberg (2013)

    Book  MATH  Google Scholar 

  15. Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Learn. 24, 1273–1280 (2002)

    Article  Google Scholar 

Download references

Acknowledgments

Authors thank CAPES, CNPq and FACCAMP.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paulo Rogerio Nietto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Nietto, P.R., do Carmo Nicoletti, M. (2017). Estimating the Number of Clusters as a Pre-processing Step to Unsupervised Learning. In: Madureira, A., Abraham, A., Gamboa, D., Novais, P. (eds) Intelligent Systems Design and Applications. ISDA 2016. Advances in Intelligent Systems and Computing, vol 557. Springer, Cham. https://doi.org/10.1007/978-3-319-53480-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53480-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53479-4

  • Online ISBN: 978-3-319-53480-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics