Improving Classifier Performance by Using Fictitious Training Data? A Case Study

Stecking, Ralf; Schebesch, Klaus B.

doi:10.1007/978-3-540-77903-2_14

Ralf Stecking² &
Klaus B. Schebesch³

Part of the book series: Operations Research Proceedings ((ORP,volume 2007))

1808 Accesses

Abstract

Many empirical data describing features of some persons or objects with associated class labels (e.g. credit client features and the recorded defaulting behaviors in our application [5], [6]) are clearly not linearly separable. However, owing to an interplay of relatively sparse data (relating to high dimensional input feature spaces) and a validation procedure like leave-one-out, a nonlinear classification cannot, in many cases, improve this situation but in a minor way. Attributing all the remaining errors to noise seems rather implausible, as data recording is offline and not prone to errors of the type occurring e.g. when measuring process data with (online) sensors. Experiments with classification models on input subsets even suggest that our credit client data contain some hidden redundancy. This was not eliminated by statistical data preprocessing and leads to rather competitive validated models on input subsets and even to slightly superior results for combinations of such input subset base models [3]. These base models all reflect different views of the same data. However, class regions with highly nonlinear boundaries can also occur if important features (i.e. other explaining factors) are for some reason not available (unknown, neglected, etc.). In order to see this, simply project linearly separable data onto a feature subset with smaller dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A problem-agnostic approach to feature selection and analysis using SHAP

Article Open access 24 January 2025

An apparent paradox: a classifier based on a partially classified sample may have smaller expected error rate than that if the sample were completely classified

Article 05 September 2020

A Wrapper Feature Selection Approach to Classification with Missing Data

References

Duin, R.P.W. and Pekalska, E. (2005): Open issues in pattern recognition, to be found at: www-ict.ewi.tudelf.nl/~duin/papers/cores_05_open_issues.pdf
Google Scholar
Schebesch, K.B. and Stecking, R. (2007): Selecting SVM Kernels and Input Variable Subsets in Credit Scoring Models. In: Decker, R., Lenz, H.-J. (Eds.): Advances in Data Analysis. Springer, Berlin, 179–186.
Chapter Google Scholar
Schebesch, K.B. and Stecking, R. (2007): Using Multiple SVM Models for Unbalanced Credit Scoring Data Sets. Proceedings of the 31th International GfKl Conference, Freiburg.
Google Scholar
Schölkopf, B. and Smola, A. (2002): Learning with Kernels. The MIT Press, Cambridge.
Google Scholar
Stecking, R. and Schebesch, K.B. (2003): Support Vector Machines for Credit Scoring: Comparing to and Combining with some Traditional Classification Methods. In: Schader, M., Gaul, W., Vichi, M. (Eds.): Between Data Science and Applied Data Analysis. Springer, Berlin, 604–612.
Google Scholar
Stecking, R. and Schebesch, K.B. (2006): Comparing and Selecting SVM-Kernels for Credit Scoring. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (Eds.): From Data and Information Analysis to Knowledge Engineering. Springer, Berlin, 542–549.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Economics, University of Oldenburg, D-26111, Oldenburg, Germany
Ralf Stecking
Faculty of Economics, University “Vasile Goldiş”, Arad, Romania
Klaus B. Schebesch

Authors

Ralf Stecking
View author publications
You can also search for this author in PubMed Google Scholar
Klaus B. Schebesch
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Law and Economics, Chair of Operations Research and Logistics, Saarland University, P.O. Box 15 11 50, 66041, Saarbrücken, Germany
Jörg Kalcsics & Stefan Nickel &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stecking, R., Schebesch, K.B. (2008). Improving Classifier Performance by Using Fictitious Training Data? A Case Study. In: Kalcsics, J., Nickel, S. (eds) Operations Research Proceedings 2007. Operations Research Proceedings, vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77903-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-77903-2_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77902-5
Online ISBN: 978-3-540-77903-2
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics