Estimating the Number of Clusters as a Pre-processing Step to Unsupervised Learning

Nietto, Paulo Rogerio; do Carmo Nicoletti, Maria

doi:10.1007/978-3-319-53480-0_3

Paulo Rogerio Nietto¹⁸ &
Maria do Carmo Nicoletti^18,19

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 557))

Included in the following conference series:

International Conference on Intelligent Systems Design and Applications

1752 Accesses

Abstract

A great challenge in machine learning, as far as unsupervised algorithms are concerned, is to devise methods for pre-estimating the number of clusters associated to a given set of patterns to be clustered. By doing so and by using the number of clusters as input to clustering algorithms that require the information, the chances of getting better results increase substantially. The work described in this paper investigates the performance of an algorithm, based on the sequential clustering BSAS (Basic Sequential Algorithmic Scheme), to produce an ordered list (by frequency of occurrences), containing good estimates for the number of clusters in a given set of patterns. The BSAS is a convenient choice since the order in which patterns are presented to the algorithm can impact the induced clustering. The results of the experiments in eight sets of patterns can be considered empirical evidence that the procedure can be a practical and reliable option, as a pre-processing step, to using clustering algorithms that require the number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Novel Clustering Algorithm Based on a Non-parametric “Anti-Bayesian” Paradigm

Effective Data Clustering Algorithms

Pattern Recognition: Supervised Learning on the Basis of Cluster Structures

References

Asano, T., Bhattacharya, B., Keil, M., Yao, F.: Clustering algorithms based on minimum and maximum spanning trees. In: Proceedings. of the Fourth Annual Symposium on Computational Geometry (SCG 1988), pp. 252–257 (1988)
Google Scholar
Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Inf. Process. Lett. 76, 175–181 (2000)
Article MathSciNet MATH Google Scholar
Päivinen, N.: Clustering with minimum spanning tree of scale-free structure. Pattern Recogn. 26, 921–930 (2005)
Article Google Scholar
Luxburg, U.: A tutorial on spectral clustering. J. Stat. Comput. 17, 395–416 (2007)
Article MathSciNet Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Theodorides, S., Kotroumbas, K.: Pattern Recognition, 4th edn. Elsevier, USA (2009)
Google Scholar
Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidleberg (2006)
Chapter Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 645–678 (2005)
Article Google Scholar
Nicoletti, M.C., Real E.M., Oliveira, O.L.: The impact of refinement strategies on sequential clustering algorithms. In: Proceedings of the 13th International Conference on Intelligent Systems Design and Applications (ISDA 2013), pp. 47–52 (2013)
Google Scholar
Real, E.M., Nicoletti, M.C., Oliveira, O.L.: A closer look into sequential clustering algorithms and associated post-processing refinement strategies. Int. J. Innov. Comput. Appl. 6, 1–12 (2014)
Article Google Scholar
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20, 68–86 (1971)
Article MATH Google Scholar
Wertheimer, M.: Principles of perceptual organization. In: Beardsley, D., Wertheimer, M. (eds.) Readings in Perception. Van Nostrand, Princeton (1958)
Google Scholar
Liu, Y., Li, Z., Xiong, H., Gao X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of the 10th International IEEE Conference on Data Mining (ICMD), pp. 911–916 (2010)
Google Scholar
Bandyopadhyay, S., Saha, S.: Unsupervised Classification. Springer, Heidelberg (2013)
Book MATH Google Scholar
Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Learn. 24, 1273–1280 (2002)
Article Google Scholar

Download references

Acknowledgments

Authors thank CAPES, CNPq and FACCAMP.

Author information

Authors and Affiliations

Faculdade Campo Limpo Paulista (FACCAMP), Campo Limpo Paulista, São Paulo, Brazil
Paulo Rogerio Nietto & Maria do Carmo Nicoletti
Universidade Federal de São Carlos (UFSCar), São Carlos, São Paulo, Brazil
Maria do Carmo Nicoletti

Authors

Paulo Rogerio Nietto
View author publications
You can also search for this author in PubMed Google Scholar
Maria do Carmo Nicoletti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paulo Rogerio Nietto .

Editor information

Editors and Affiliations

Departamento de Engenharia Informática, Instituto Superior de Engenharia do Port, Porto, Portugal
Ana Maria Madureira
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs, Auburn, Washington, USA
Ajith Abraham
Polytechnic Institute of Porto, Felgueiras, Portugal
Dorabela Gamboa
Campus of Gualtar, University of Minho, Braga, Portugal
Paulo Novais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nietto, P.R., do Carmo Nicoletti, M. (2017). Estimating the Number of Clusters as a Pre-processing Step to Unsupervised Learning. In: Madureira, A., Abraham, A., Gamboa, D., Novais, P. (eds) Intelligent Systems Design and Applications. ISDA 2016. Advances in Intelligent Systems and Computing, vol 557. Springer, Cham. https://doi.org/10.1007/978-3-319-53480-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-53480-0_3
Published: 23 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53479-4
Online ISBN: 978-3-319-53480-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics