Skip to main content
Log in

Data clustering based on principal curves

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In this contribution we present a new method for data clustering based on principal curves. Principal curves consist of a nonlinear generalization of principal component analysis and may also be regarded as continuous versions of 1D self-organizing maps. The proposed method implements the k-segment algorithm for principal curves extraction. Then, the method divides the principal curves into two or more curves, according to the number of clusters defined by the user. Thus, the distance between the data points and the generate curves is calculated and, afterwards, the classification is performed according to the smallest distance found. The method was applied to nine databases with different dimensionality and number of classes. The results were compared with three clustering algorithms: the k-means algorithm and the 1-D and 2-D self-organizing map algorithms. Experiments show that the method is suitable for clusters with elongated and spherical shapes and achieved significantly better results in some data sets than other clustering algorithms used in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  • Carvalho AM, Adão P, Mateus P (2014) Hybrid learning of Bayesian multinets for binary classification. Pattern Recognit 47(10):3438–3450

    Article  Google Scholar 

  • Chang K, Ghosh J (1998a) Principal curve classifier: a nonlinear approach to pattern classification. In: IEEE world congress on computational intelligence. IEEE international joint conference on neural networks proceedings, pp 695–700

  • Chang K, Ghosh J (1998b) Principal curves for nonlinear feature extraction and classification. Appl Artif Neural Netw Image Process III 3307:120–129

    Google Scholar 

  • Chen Z, Ellis T (2014) A self-adaptive gaussian mixture model. Comput Vis Image Underst 122:35–46

    Article  Google Scholar 

  • Cleju I, Fränti P, Wu X (2005) Clustering based on principal curve. In: Kalviainen H, Parkkinen J, Kaarna A (eds) Image analysis, Lecture Notes in Computer Science, vol 3540. Springer, Berlin, pp 872–881

    Chapter  Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  • Cuingnet R, Rosso C, Chupin M, Lehéricy S, Dormont D, Benali H, Samson Y, Colliot O (2011) Spatial regularization of \(\{\text{ SVM }\}\) for the detection of diffusion alterations associated with stroke outcome. Med Image Anal 15(5):729–737

    Article  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, Hoboken

    MATH  Google Scholar 

  • Ferreira DD, de Seixas JM, Cerqueira AS, Duque CA (2013) Exploiting principal curves for power quality monitoring. Electr Power Syst Res 100:1–6

    Article  Google Scholar 

  • Ferreira DD, de Seixas JM, Duque CA, Cerqueira AS (2014) A direct approach for disturbance detection based on principal curves. In: IEEE 16th international conference on harmonics and quality of power, pp 747–751

  • Ferreira DD, de Seixas JM, Cerqueira AS, Duque CA, Bollen MHJ, Ribeiro PF (2015) A new power quality deviation index based on principal curves. Electr Power Syst Res 125:8–14

    Article  Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188

    Article  Google Scholar 

  • Gersho A, Gray RM (1992) Vector quantization and signal compression. Kluwer Academic Publishers, Boston

    Book  Google Scholar 

  • Hastie TJ, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84(406):502–516

    Article  MathSciNet  Google Scholar 

  • Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666

    Article  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Kégl B, Krzyzak A, Linder T, Zeger K (2000) Learning and design of principal curves. IEEE Trans Pattern Anal Mach Intell 22(3):281–297

    Article  Google Scholar 

  • Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Plathottam SJ, Salehfar H (2016) Induction machine transient energy loss minimization using neural networks. In: 2016 North American Power Symposium (NAPS), pp 1–5

  • Rosa GH, Costa KAP, Júnior LAP, Papa JP, Falcão AX, Tavares JMRS (2014) On the training of artificial neural networks with radial basis function using optimum-path forest clustering. In: 2014 22nd International conference on pattern recognition, pp 1472–1477

  • Rosenblatt F (1962) Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Spartan, Washington DC

    MATH  Google Scholar 

  • Shelhamer E, Long J, Darrell T (2016) Fully convolutional networks for semantic segmentation. arXiv:1605.06211

  • Stanford D, Raftery A (2000) Finding curvilinear features in spatial point patterns: principal curve clustering with noise. IEEE Trans Pattern Anal Mach Intell 22(6):601–609

    Article  Google Scholar 

  • Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Elsevier, Amsterdam

    MATH  Google Scholar 

  • Vatanen T, Osmala M, Raiko T, Lagus K, Sysi-Aho M, Orešič M, Honkela T, Lähdesmäki H (2015) Self-organization and missing values in \(\{\text{ SOM }\}\) and \(\{\text{ GTM }\}\). Neurocomputing 147:60–70

    Article  Google Scholar 

  • Verbeek JJ, Vlassis N, Krose B (2002) A K-segments Algorithm for Finding Principal Curves. Pattern Recognit Lett 23:1009–1017

    Article  Google Scholar 

  • Wang H, Lee TCM (2006) Automatic parameter selection for a K-segments algorithm for computing principal curves. Pattern Recognit Lett 27:1142–1150

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danton Diego Ferreira.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moraes, E.C.C., Ferreira, D.D., Vitor, G.B. et al. Data clustering based on principal curves. Adv Data Anal Classif 14, 77–96 (2020). https://doi.org/10.1007/s11634-019-00363-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-019-00363-w

Keywords

Mathematics Subject Classification

Navigation