Abstract
The scalability of machine learning (ML) algorithms has become increasingly important due to the ever increasing size of datasets and increasing complexity of the models induced. Standard approaches for dealing with this issue generally involve developing parallel and distributed versions of the ML algorithms and/or reducing the dataset sizes via sampling techniques. In this paper we describe an alternative approach that combines features of spatially-structured evolutionary algorithms (SSEAs) with the well-known machine learning techniques of ensemble learning and boosting. The result is a powerful and robust framework for parallelizing ML methods in a way that does not require changes to the ML methods. We first describe the framework and illustrate its behavior on a simple synthetic problem, and then evaluate its scalability and robustness using several different ML methods on a set of benchmark problems from the UC Irvine ML database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bordes, A., Bottou, L., Gallinari, P.: Sgd-qn: Careful quasi-newton stochastic gradient descent. Journal of Machine Learning Research 10, 1737–1754 (2009)
Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods (1997)
Sarma, J., De Jong, K.: An Analysis of the Effects of Neighborhood Size and Shape on Local Selection Algorithms. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN IV. LNCS, vol. 1141, pp. 236–244. Springer, Heidelberg (1996)
Tomassini, M.: Spatially structured evolutionary algorithms: artificial evolution in space and time. Natural computing series. Springer (2005)
Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)
Pamuk, B., Can, T.: Coevolution based prediction of protein-protein interactions with reduced training data. In: 2010 5th International Symposium on Health Informatics and Bioinformatics (HIBIT), pp. 187–193 (April 2010)
Banks, R.B.: Growth and Diffusion Phenomena: Mathematical Frameworks and Applications. Springer (1993)
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Yu, C., Skillicorn, D.B.: Parallelizing boosting and bagging (2001)
Favre, B., Hakkani-Tür, D., Cuendet, S.: Icsiboost (2007), http://code.google.come/p/icsiboost
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kamath, U., Kaers, J., Shehu, A., De Jong, K.A. (2012). A Spatial EA Framework for Parallelizing Machine Learning Methods. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds) Parallel Problem Solving from Nature - PPSN XII. PPSN 2012. Lecture Notes in Computer Science, vol 7491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32937-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-32937-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32936-4
Online ISBN: 978-3-642-32937-1
eBook Packages: Computer ScienceComputer Science (R0)