Finding Robust Models Using a Stratified Design

Baxter, Rohan A.

doi:10.1007/11941439_123

Rohan A. Baxter²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4304))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

3433 Accesses

Abstract

Predictive performance in model selection is often estimated using out-of-sample validation and test datasets. The assumption is that the test and validation datasets are from the same population as the training dataset. This assumption may not apply in the common application context where the model is applied to scoring of future data. This paper proposes a sample design which can lead to better model performance and robust estimates of model generalization error. The sample design is shown applied to a collection scoring application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L.: Statistical modeling: The two cultures. Statistical Sciences 16(3), 199–231 (2001)
Article MATH MathSciNet Google Scholar
Hand, D., Mannila, H., Smyth, P.: Principles of data mining. MIT Press, Cambridge (2001)
Google Scholar
Han, J., Kimber, M.: Data Mining: Concepts and techniques. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Witten, I., Frank, E.: Data mining: Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Maindonald, J.: The role of models in predictive validation (statistics for budding data miners). In: ISIS (2003)
Google Scholar
Dhar, V., Stein, R.: Finding robust and usable models from data mining. In: PCAI (1998)
Google Scholar
Zadrozny, B.: Learning and Evaluating Classifiers under Sample Selection Bias (2004)
Google Scholar
Elkan, C.: Foundations of cost-sensitive learning (2001)
Google Scholar
Blake, C., Keogh, E., Merz, C.: Uci repository of machine learning databases. UCI website (2001)
Google Scholar
Hand, D.: Classifier technology and the illusion of progress. Statistical Science 21, 1–15 (2006)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Analytics Project, Office of the Chief Knowledge Officer, Australian Taxation Office, P.O. Box 900, Civic Square, ACT, 2608
Rohan A. Baxter

Authors

Rohan A. Baxter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DisPRR, National ICT Australia Ltd, QLD, Australia
Abdul Sattar
School of Computing, University of Tasmania, Sandy Bay, 7005, Tasmania, Australia
Byeong-ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baxter, R.A. (2006). Finding Robust Models Using a Stratified Design. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_123

Download citation

DOI: https://doi.org/10.1007/11941439_123
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics