Skip to main content
Log in

Gaussian kernel width exploration and cone cluster labeling for support vector clustering

  • Industrial and Commercial Application
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The process of clustering groups together data points so that intra-cluster similarity is maximized while inter-cluster similarity is minimized. Support vector clustering (SVC) is a clustering approach that can identify arbitrarily shaped cluster boundaries. The execution time of SVC depends heavily on several factors: choice of the width of a kernel function that determines a nonlinear transformation of the input data, solution of a quadratic program, and the way that the output of the quadratic program is used to produce clusters. This paper builds on our prior SVC research in two ways. First, we propose a method for identifying a kernel width value in a region where our experiments suggest that clustering structure is changing significantly. This can form the starting point for efficient exploration of the space of kernel width values. Second, we offer a technique, called cone cluster labeling, that uses the output of the quadratic program to build clusters in a novel way that avoids an important deficiency present in previous methods. Our experimental results use both two-dimensional and high-dimensional data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. MATLAB is a registered trademark of The MathWorks, Inc..

  2. The feature space is a Hilbert space, so the Pythagorean theorem holds.

  3. MATLAB is a registered trademark of The MathWorks, Inc.

References

  1. Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2001) Support vector clustering. J Mach Learning Res 2:125–137

    Google Scholar 

  2. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, New York

    Google Scholar 

  3. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, New York

    MATH  Google Scholar 

  4. Dong J, Krzyz ak A, Suen C (2005) Fast SVM training algorithm with decomposition on very large data sets. IEEE Trans Pattern Anal Mach Intell 27(4):603–618

    Article  Google Scholar 

  5. Estivill-Castro V (2002) Why so many clustering algorithms—a position paper. SIGKDD Explorations 4(1):65–75

    Article  MathSciNet  Google Scholar 

  6. Estivill-Castro V, Lee I (2000) Automatic clustering via boundary extraction for mining massive point-data sets. In: Proceedings of the 5th international conference on geocomputation

  7. Estivill-Castro V, Lee I (2000) Hierarchical clustering based on spatial proximity using delaunay diagram. In: Proceedings of 9th international symposium on spatial data handling, pp 7a.26–7a.41

  8. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining (KDD-96), Portland, pp 226–231

  9. Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Oxford University Press, New York

    MATH  Google Scholar 

  10. Fasulo D (1999) An analysis of recent work on clustering algorithms. Technical report 01-03-02, University of Washington

  11. Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco

  12. Harel D, Koren Y (2001) Clustering spatial data using random walks. In: Proceedings of knowledge discovery and data mining (KDD’01), pp 281–286

  13. Horn D (2001) Clustering via Hilbert space. Physica A 302:70–79

    Article  MathSciNet  MATH  Google Scholar 

  14. Hartuv E, Shamir R (2000) A clustering algorithm based on graph connectivity. Inf Process Lett 76(200):175–181

    Article  MathSciNet  MATH  Google Scholar 

  15. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York

    MATH  Google Scholar 

  16. Jonyer I, Holder LB, Cook DJ (2001) Graph-based hierarchical conceptual clustering. Int J Artif Intell Tools 10(1–2):107–135

    Article  Google Scholar 

  17. Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surveys 31:264–323

    Article  Google Scholar 

  18. Lee S, Daniels K (2004) Gaussian kernel width exploration in support vector clustering. Technical report 2004-009, University of Massachusetts Lowell, Department of Computer Science

  19. Lee S, Daniels K (2005) Gaussian kernel width generator for support vector clustering. In: He M, Narasimhan G, Petoukhov S (eds) Proceedings, international conference on bioinformatics and its applications and advances in bioinformatics and its applications. Advances in bioinformatics and its applications. Series in mathematical biology and medicine, vol 8. World Scientific, pp 151–162

  20. Lee S, Daniels K (2006) Cone cluster labeling for support vector clustering. In: Proceedings of 2006 SIAM conference on data mining, pp 484–488

  21. Lee S (2005) Gaussian kernel width selection and fast cluster labeling for support vector clustering. Doctoral thesis and Technical report 2005-009, University of Massachusetts Lowell, Department of Computer Science

  22. Lee J, Lee D (2005) An improved cluster labeling method for support vector clustering. IEEE Trans Pattern Anal Mach Intell 27:461–464

    Article  Google Scholar 

  23. Mortenson M (2006) Geometric modeling, 3rd edn. Industrial Press Inc, New York

    Google Scholar 

  24. Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/∼mlearn/mlrepository.html

  25. Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: Scholkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208

  26. Preparata FP, Shamos MI (1985) Computational geometry. Springer, New York

    Google Scholar 

  27. Vapnik VN (1995) The nature of statistical learning theory, 2nd edn. Springer, New York

    MATH  Google Scholar 

  28. Yang J, Estivill-Castro V, Chalup SK (2002) Support vector clustering through proximity graph modeling. In: Proceedings of 9th international conference on neural information processing (ICONIP’02), pp 898–903

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sei-Hyung Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, SH., Daniels, K.M. Gaussian kernel width exploration and cone cluster labeling for support vector clustering. Pattern Anal Applic 15, 327–344 (2012). https://doi.org/10.1007/s10044-011-0244-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-011-0244-8

Keywords

Navigation