Abstract
Many empirical data describing features of some persons or objects with associated class labels (e.g. credit client features and the recorded defaulting behaviors in our application [5], [6]) are clearly not linearly separable. However, owing to an interplay of relatively sparse data (relating to high dimensional input feature spaces) and a validation procedure like leave-one-out, a nonlinear classification cannot, in many cases, improve this situation but in a minor way. Attributing all the remaining errors to noise seems rather implausible, as data recording is offline and not prone to errors of the type occurring e.g. when measuring process data with (online) sensors. Experiments with classification models on input subsets even suggest that our credit client data contain some hidden redundancy. This was not eliminated by statistical data preprocessing and leads to rather competitive validated models on input subsets and even to slightly superior results for combinations of such input subset base models [3]. These base models all reflect different views of the same data. However, class regions with highly nonlinear boundaries can also occur if important features (i.e. other explaining factors) are for some reason not available (unknown, neglected, etc.). In order to see this, simply project linearly separable data onto a feature subset with smaller dimension.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Duin, R.P.W. and Pekalska, E. (2005): Open issues in pattern recognition, to be found at: www-ict.ewi.tudelf.nl/~duin/papers/cores_05_open_issues.pdf
Schebesch, K.B. and Stecking, R. (2007): Selecting SVM Kernels and Input Variable Subsets in Credit Scoring Models. In: Decker, R., Lenz, H.-J. (Eds.): Advances in Data Analysis. Springer, Berlin, 179–186.
Schebesch, K.B. and Stecking, R. (2007): Using Multiple SVM Models for Unbalanced Credit Scoring Data Sets. Proceedings of the 31th International GfKl Conference, Freiburg.
Schölkopf, B. and Smola, A. (2002): Learning with Kernels. The MIT Press, Cambridge.
Stecking, R. and Schebesch, K.B. (2003): Support Vector Machines for Credit Scoring: Comparing to and Combining with some Traditional Classification Methods. In: Schader, M., Gaul, W., Vichi, M. (Eds.): Between Data Science and Applied Data Analysis. Springer, Berlin, 604–612.
Stecking, R. and Schebesch, K.B. (2006): Comparing and Selecting SVM-Kernels for Credit Scoring. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (Eds.): From Data and Information Analysis to Knowledge Engineering. Springer, Berlin, 542–549.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stecking, R., Schebesch, K.B. (2008). Improving Classifier Performance by Using Fictitious Training Data? A Case Study. In: Kalcsics, J., Nickel, S. (eds) Operations Research Proceedings 2007. Operations Research Proceedings, vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77903-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-77903-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77902-5
Online ISBN: 978-3-540-77903-2
eBook Packages: Business and EconomicsBusiness and Management (R0)