Efficient Sampling and Handling of Variance in Tuning Data Mining Models

Koch, Patrick; Konen, Wolfgang

doi:10.1007/978-3-642-32937-1_20

Patrick Koch²¹ &
Wolfgang Konen²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7491))

Included in the following conference series:

International Conference on Parallel Problem Solving from Nature

1941 Accesses

Abstract

Computational Intelligence (CI) provides good and robust working solutions for global optimization. CI is especially suited for solving difficult tasks in parameter optimization when the fitness function is noisy. Such situations and fitness landscapes frequently arise in real-world applications like Data Mining (DM). Unfortunately, parameter tuning in DM is computationally expensive and CI-based methods often require lots of function evaluations until they finally converge in good solutions. Earlier studies have shown that surrogate models can lead to a decrease of real function evaluations. However, each function evaluation remains time-consuming. In this paper we investigate if and how the fitness landscape of the parameter space changes, when only fewer observations are used for the model trainings during tuning. A representative study on seven DM tasks shows that the results are nevertheless competitive. On all these tasks, a fraction of 10-15% of the training data is sufficient. With this the computation time can be reduced by a factor of 6-10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

HPO $$\times $$ ELA: Investigating Hyperparameter Optimization Landscapes by Means of Exploratory Landscape Analysis

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Article 19 January 2024

An Expensive Multi-objective Optimization Algorithm Based on Regional Density Ratio

References

Bartz-Beielstein, T., Lasarczyk, C., Preuss, M.: The SPO Toolbox. In: Bartz-Beielstein, et al. (eds.) Experimental Methods for the Analysis of Optimization Algorithms, pp. 337–360. Springer, Heidelberg (2010)
Chapter Google Scholar
Bischl, B., Mersmann, O., Trautmann, H.: Resampling methods in model validation. In: Proc. WEMACS 2010, Joint to PPSN 2010, Krakow, p. 14 (2010)
Google Scholar
Cochran, W.G.: Sampling techniques. Wiley-India (2007)
Google Scholar
Daelemans, W., Hoste, V., De Meulder, F., Naudts, B.: Combined Optimization of Feature Selection and Algorithm Parameters in Machine Learning of Language. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 84–95. Springer, Heidelberg (2003)
Chapter Google Scholar
Efron, B.: Bootstrap methods: another look at the jackknife. The Annals of Statistics 7(1), 1–26 (1979)
Article MathSciNet MATH Google Scholar
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9, 159–195 (2001)
Article Google Scholar
Jain, A., Zongker, D.: Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on PAMI 19(2), 153–158 (1997)
Article Google Scholar
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: Kernlab — An S4 package for kernel methods in R. Technical report, Department of Statistics and Mathematics, WU Vienna University of Economics and Business, Vienna (2004)
Google Scholar
Koch, P., Konen, W.: Stability issues in tuning very noisy functions. Submitted to PPSN 2012 Workshop on Automated Selection and Tuning of Algorithms (2012)
Google Scholar
Konen, W., Koch, P., Flasch, O., Bartz-Beielstein, T., Friese, M., Naujoks, B.: Tuned data mining: A benchmark study on different tuners. In: Proc. GECCO 2011, Dublin, pp. 1995–2002. ACM (2011)
Google Scholar
Lenth, R.V.: Some practical guidelines for effective sample size determination. The American Statistician 55(3), 187–193 (2001)
Article MathSciNet Google Scholar
McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 239–245 (1979)
Google Scholar
Osuna, E., Freund, R., Girosi, F.: Training support vector machines: an application to face detection. In: IEEE Proc. CVPR 1997, pp. 130–136 (1997)
Google Scholar
Raudys, S.J., Jain, A.K.: Small sample size effects in statistical pattern recognition. IEEE Transactions on PAMI 13(3), 252–264 (1991)
Article Google Scholar
Schölkopf, B., Smola, A.J.: Learning with kernels: Support vector machines, regularization, optimization, and beyond. The MIT Press (2002)
Google Scholar
Vapnik, V.: Statistical learning theory (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cologne University of Applied Sciences, 51643, Gummersbach, Germany
Patrick Koch & Wolfgang Konen

Authors

Patrick Koch
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Konen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centro de Investigacion y de Estudios, Avanzados del Instituto Politecnico Nacional (CINVESTAV-IPN), Departmento de Computation, Av. IPN No. 2508, Col. San Pedro Zacatenco, 0360, Mexico, D.F., Mexico
Carlos A. Coello Coello
Department of Mathematics and Computer Science, University of Catania, V.le A. Doria 6, 95125, Catania, Italy
Vincenzo Cutello & Mario Pavone &
Kanpur Genetic Algorithms Laboratory (KanGAL), Indian Institute of Technology, Kanpur, Kanpur, India
Kalyanmoy Deb
Department of Computer Science, University of New, Mexico, USA
Stephanie Forrest
Department of Mathematics and Computer Science, University of Catania, Viale A. Doria 6, 95125, Catania, Italy
Giuseppe Nicosia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koch, P., Konen, W. (2012). Efficient Sampling and Handling of Variance in Tuning Data Mining Models. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds) Parallel Problem Solving from Nature - PPSN XII. PPSN 2012. Lecture Notes in Computer Science, vol 7491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32937-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-32937-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32936-4
Online ISBN: 978-3-642-32937-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics