Abstract
We present an application of genetic algorithms to search the space of model building parameters for optimizing the score function or accuracy of a predictive data mining model. The goal of predictive modeling is to build a classification or regression model that can accurately predict the value of a target column by observing the values of the input attributes. The process of finding an optimal algorithm and its control parameters for building a predictive model is a non-trivial process because of two reasons. The first reason is that the number of classification algorithms and its control parameters are very large. The second reason is that it can be quite time consuming to build a model for datasets containing a large number of records and attributes. These two reasons makes it impractical to enumerate through every algorithm and its possible control parameters for finding an optimal model. Genetic Algorithms are adaptive heuristic search algorithm and have been successfully applied to solve optimization problems in diverse domains. In this work, we formulate the problem of finding optimal predictive model building parameter as an optimization problem and examine the usefulness of genetic algorithms. We perform experiments on several datasets and report empirical results to show the applicability of genetic algorithms to the problem of finding optimal predictive model building parameters.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bck, T.: Selective pressure in evolutionary algorithms: A characterization of selection mechanisms. In: Proceedings of the first ieee conference on evolutionary computation. IEEE World Congress on Computational Intelligence (wcci) (1994)
Cohen, W.W.: Fast effective rule induction. In: Proceedings 12th international conference on machine learning, Morgan kaufmann, San Francisco (1995)
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 899–7667 (1998)
Hand, H.M.D.J., Smyth, P.: Principles of data mining. MIT Press, Cambridge (2000)
Goldberg, D.E., Deb, K.: A comparative analysis of selection scheme used in genetic algorithms. In: Rawlins, G. (ed.) Foundations of genetic algorithms. Morgan Kaufmann, San Mateo (1991)
Goldberg, D.E.: Genetic algorithms in search, optimization, and machine learning, 1st edn. Addison-Wesley, Reading (1989)
Haupt, R.L., Haupt, S.E.: Practical genetic algorithms, 2nd edn. John Wiley and sons, Chichester (2004)
Kamber, M., Han, J.: Data mining: Concepts and techniques, 1st edn. Elsevier science and technology books, Amsterdam (2000) isbn: 1558604898
Liu, H., Setiono, R.: A probabilistic approach to feature selection - a filter solution. In: 13th international conference on machine learning (ICML), Bari, Italy, pp. 319–327 (1996)
Mitchell, M.: An introduction to genetic algorithms. MIT Press, Cambridge (1998)
Nilsson, N.J.: Problem-solving methods in artificial intelligence. McGraw-Hill companies, New York (1971)
Quinlan, R.: Induction of decision trees. Machine learning 1, 81–106 (1986)
Quinlan, R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo (1993)
Feng, C., King, R., Shutherland, A.: Statlog: comparison of classification algorithms on large real-world problems. Applied artificial intelligence 9(3), 259–287 (1995)
Vafaie, H., De Jong, K.: Robust feature selection algorithms. In: Proceedings of the 5th IEEE International conference on tools for artificial intelligence, pp. 356–363. IEEE Press, Boston (1993)
Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sureka, A., Indukuri, K.V. (2008). Using Genetic Algorithms for Parameter Optimization in Building Predictive Data Mining Models. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-88192-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)