Abstract
The main objective of this paper is to investigate the relationship between the size of training sample and the predictive power of well-known classification techniques. We first display this relationship using the results of some empirical studies and then propose a general mathematical model which can explain this relationship. Next, we validate this model on some real data sets and found that the model provides a good fit to the data. This model also allow a more objective determination of optimum training sample size in contrast to current training sample size selection approaches which tend to be ad hoc or subjective.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Freund, Y.: An adaptive version of the boosting by majority algorithm. Machine Learning 43(3), 293–318 (2001)
Schapire, R.E.: Drifting games. Machine Learning 43(3), 265–291 (2001)
Ueda, N.: Optimal linear combination of neural networks for improving classification performance. Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence 22(2), 207–215 (2000)
Webb, G.I.: MultiBoosting: a technique for combining boosting and wagging. Machine Learning 40(2), 159–196 (2000)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)
Weiss, S.M., Indurkhya, N.: Predictive Data Mining: A Practical Guide. Morgan Kaufmann Publishers, California (1998)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, California (2000)
Groth, R.: Data Mining: Building Competitive Advantage. Prentice Hall, New Jersey (2000)
Berry, M.J.A., Linoff, G.: Data Mining Techniques For Marketing, Sales and Customer Support. John Wiley & Sons, New York (1997)
Lewis, E.M.: An Introduction to Credit Scoring. Athena Press, California (1992)
Valiant, L.G.: A theory of the learnable. Communications of the ACM 27, 1134–1142 (1984)
Schapire, R.E.: The strength of weak learnability. Machine Learning 5, 197–227 (1990)
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boonyanunta, N., Zeephongsekul, P. (2004). Predicting the Relationship Between the Size of Training Sample and the Predictive Power of Classifiers. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2004. Lecture Notes in Computer Science(), vol 3215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30134-9_71
Download citation
DOI: https://doi.org/10.1007/978-3-540-30134-9_71
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23205-6
Online ISBN: 978-3-540-30134-9
eBook Packages: Springer Book Archive