Abstract
Computational Intelligence (CI) provides good and robust working solutions for global optimization. CI is especially suited for solving difficult tasks in parameter optimization when the fitness function is noisy. Such situations and fitness landscapes frequently arise in real-world applications like Data Mining (DM). Unfortunately, parameter tuning in DM is computationally expensive and CI-based methods often require lots of function evaluations until they finally converge in good solutions. Earlier studies have shown that surrogate models can lead to a decrease of real function evaluations. However, each function evaluation remains time-consuming. In this paper we investigate if and how the fitness landscape of the parameter space changes, when only fewer observations are used for the model trainings during tuning. A representative study on seven DM tasks shows that the results are nevertheless competitive. On all these tasks, a fraction of 10-15% of the training data is sufficient. With this the computation time can be reduced by a factor of 6-10.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bartz-Beielstein, T., Lasarczyk, C., Preuss, M.: The SPO Toolbox. In: Bartz-Beielstein, et al. (eds.) Experimental Methods for the Analysis of Optimization Algorithms, pp. 337–360. Springer, Heidelberg (2010)
Bischl, B., Mersmann, O., Trautmann, H.: Resampling methods in model validation. In: Proc. WEMACS 2010, Joint to PPSN 2010, Krakow, p. 14 (2010)
Cochran, W.G.: Sampling techniques. Wiley-India (2007)
Daelemans, W., Hoste, V., De Meulder, F., Naudts, B.: Combined Optimization of Feature Selection and Algorithm Parameters in Machine Learning of Language. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 84–95. Springer, Heidelberg (2003)
Efron, B.: Bootstrap methods: another look at the jackknife. The Annals of Statistics 7(1), 1–26 (1979)
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9, 159–195 (2001)
Jain, A., Zongker, D.: Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on PAMI 19(2), 153–158 (1997)
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: Kernlab — An S4 package for kernel methods in R. Technical report, Department of Statistics and Mathematics, WU Vienna University of Economics and Business, Vienna (2004)
Koch, P., Konen, W.: Stability issues in tuning very noisy functions. Submitted to PPSN 2012 Workshop on Automated Selection and Tuning of Algorithms (2012)
Konen, W., Koch, P., Flasch, O., Bartz-Beielstein, T., Friese, M., Naujoks, B.: Tuned data mining: A benchmark study on different tuners. In: Proc. GECCO 2011, Dublin, pp. 1995–2002. ACM (2011)
Lenth, R.V.: Some practical guidelines for effective sample size determination. The American Statistician 55(3), 187–193 (2001)
McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 239–245 (1979)
Osuna, E., Freund, R., Girosi, F.: Training support vector machines: an application to face detection. In: IEEE Proc. CVPR 1997, pp. 130–136 (1997)
Raudys, S.J., Jain, A.K.: Small sample size effects in statistical pattern recognition. IEEE Transactions on PAMI 13(3), 252–264 (1991)
Schölkopf, B., Smola, A.J.: Learning with kernels: Support vector machines, regularization, optimization, and beyond. The MIT Press (2002)
Vapnik, V.: Statistical learning theory (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Koch, P., Konen, W. (2012). Efficient Sampling and Handling of Variance in Tuning Data Mining Models. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds) Parallel Problem Solving from Nature - PPSN XII. PPSN 2012. Lecture Notes in Computer Science, vol 7491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32937-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-32937-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32936-4
Online ISBN: 978-3-642-32937-1
eBook Packages: Computer ScienceComputer Science (R0)