Skip to main content

Improving Classifier Performance by Using Fictitious Training Data? A Case Study

  • Conference paper
Book cover Operations Research Proceedings 2007

Part of the book series: Operations Research Proceedings ((ORP,volume 2007))

Abstract

Many empirical data describing features of some persons or objects with associated class labels (e.g. credit client features and the recorded defaulting behaviors in our application [5], [6]) are clearly not linearly separable. However, owing to an interplay of relatively sparse data (relating to high dimensional input feature spaces) and a validation procedure like leave-one-out, a nonlinear classification cannot, in many cases, improve this situation but in a minor way. Attributing all the remaining errors to noise seems rather implausible, as data recording is offline and not prone to errors of the type occurring e.g. when measuring process data with (online) sensors. Experiments with classification models on input subsets even suggest that our credit client data contain some hidden redundancy. This was not eliminated by statistical data preprocessing and leads to rather competitive validated models on input subsets and even to slightly superior results for combinations of such input subset base models [3]. These base models all reflect different views of the same data. However, class regions with highly nonlinear boundaries can also occur if important features (i.e. other explaining factors) are for some reason not available (unknown, neglected, etc.). In order to see this, simply project linearly separable data onto a feature subset with smaller dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Duin, R.P.W. and Pekalska, E. (2005): Open issues in pattern recognition, to be found at: www-ict.ewi.tudelf.nl/~duin/papers/cores_05_open_issues.pdf

    Google Scholar 

  2. Schebesch, K.B. and Stecking, R. (2007): Selecting SVM Kernels and Input Variable Subsets in Credit Scoring Models. In: Decker, R., Lenz, H.-J. (Eds.): Advances in Data Analysis. Springer, Berlin, 179–186.

    Chapter  Google Scholar 

  3. Schebesch, K.B. and Stecking, R. (2007): Using Multiple SVM Models for Unbalanced Credit Scoring Data Sets. Proceedings of the 31th International GfKl Conference, Freiburg.

    Google Scholar 

  4. Schölkopf, B. and Smola, A. (2002): Learning with Kernels. The MIT Press, Cambridge.

    Google Scholar 

  5. Stecking, R. and Schebesch, K.B. (2003): Support Vector Machines for Credit Scoring: Comparing to and Combining with some Traditional Classification Methods. In: Schader, M., Gaul, W., Vichi, M. (Eds.): Between Data Science and Applied Data Analysis. Springer, Berlin, 604–612.

    Google Scholar 

  6. Stecking, R. and Schebesch, K.B. (2006): Comparing and Selecting SVM-Kernels for Credit Scoring. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (Eds.): From Data and Information Analysis to Knowledge Engineering. Springer, Berlin, 542–549.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stecking, R., Schebesch, K.B. (2008). Improving Classifier Performance by Using Fictitious Training Data? A Case Study. In: Kalcsics, J., Nickel, S. (eds) Operations Research Proceedings 2007. Operations Research Proceedings, vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77903-2_14

Download citation

Publish with us

Policies and ethics