Skip to main content

Symbolic Cluster Representations for SVM in Credit Client Classification Tasks

  • Conference paper
  • First Online:
Statistical Models for Data Analysis

Abstract

Credit client scoring on medium sized data sets can be accomplished by means of Support Vector Machines (SVM), a powerful and robust machine learning method. However, real life credit client data sets are usually huge, containing up to hundred thousands of records, with good credit clients vastly outnumbering the defaulting ones. Such data pose severe computational barriers for SVM and other kernel methods, especially if all pairwise data point similarities are requested. Hence, methods which avoid extensive training on the complete data are in high demand. A possible solution may be a combined cluster and classification approach. Computationally efficient clustering can compress information from the large data set in a robust way, especially in conjunction with a symbolic cluster representation. Credit client data clustered with this procedure will be used in order to estimate classification models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Billard, L., & Diday, E. (2006). Symbolic data analysis. New York: Wiley.

    Book  MATH  Google Scholar 

  • Bock, H. -H., & Diday, E. (2000). Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Berlin: Springer.

    Google Scholar 

  • Evgeniou, T., & Pontil, M. (2002). Support vector machines with clustering for training with very large datasets. Lectures Notes in Artificial Intelligence, 2308, 346–354.

    Google Scholar 

  • Hanley, A., & McNeil, B. (1982). The meaning and use of the area under a receiver operating characteristics (ROC) curve. Diagnostic Radiology, 143, 29–36.

    Google Scholar 

  • Japkowicz, N. (2002). Supervised learning with unsupervised output separation. In Proceedings of the 6th International Conference on Artificial Intelligence and Soft Computing (pp. 321–325).

    Google Scholar 

  • Li, B., Chi, M., Fan, J., & Xue, X. (2007). Support cluster machine. In Proceedings of the 24th international conference on machine learning (pp. 505–512).

    Google Scholar 

  • MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth symposium on math, statistics and probability (pp. 281–297).

    Google Scholar 

  • Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.

    Google Scholar 

  • Shih, L., Rennie, J. D. M., Chang, Y. H., & Karger, D. R. (2003). Text bundling: Statistics-based data reduction. In Twentieth international conference on machine learning (pp. 696–703).

    Google Scholar 

  • Wang, Y., Zhang, X., Wang, S., & Lai, K. K. (2008). Nonlinear clustering–based support vector machine for large data sets. Optimization Methods & Software – Mathematical Programming in Data Mining and Machine Learning, 23(4), 533–549.

    MathSciNet  MATH  Google Scholar 

  • Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1), 7–19.

    Article  Google Scholar 

  • Yu, H., Yang, J., & Han, J. (2003). Classifying large data sets using SVM with hierarchical clusters. In Ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 306–315).

    Google Scholar 

  • Yuan, J., Li, J., & Zhang, B. (2006). Learning concepts from large scale imbalanced data sets using support cluster machines. In Proceedings of the ACM international conference on multimedia (pp. 441–450).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ralf Stecking .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Stecking, R., Schebesch, K.B. (2013). Symbolic Cluster Representations for SVM in Credit Client Classification Tasks. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_40

Download citation

Publish with us

Policies and ethics