Towards a Scalable Distributed Fitness Evaluation Service

Funika, Włodzimierz; Koperek, Paweł

doi:10.1007/978-3-319-32149-3_46

Włodzimierz Funika^7,8 &
Paweł Koperek⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9573))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1210 Accesses
4 Citations

Abstract

Organizations across the globe gather more and more data. Large datasets require new approaches to analysis and processing, which include methods based on machine learning. In particular, the symbolic regression can provide many useful insights. Unfortunately, due to high resource requirements, the use of this method for large datasets might be unfeasible. In this paper we analyze a bottleneck in an open-source implementation of this method, we call hubert. We identify that the evaluation of individuals is the most costly operation. As a solution to this problem, we propose a new evaluation service based on the Apache Spark framework, which attempts to speed up computations by distributing them on a cluster of machines. We compare the performance of the service by analyzing the execution time for a number of samples with use of both implementations. Then we discuss how the computation time improves with increased amount of resources. Finally we draw conclusions and outline plans for further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amazon.com Inc: AWS Amazon Elastic Compute Cloud (EC2) - Scalable Cloud Hosting (2014). http://aws.amazon.com/ec2. Accessed 02 April 2015
Apache Software Foundation: Welcome to apache TM hadoop! (2014). http://hadoop.apache.org/. Accessed 11 March 2015
Baldeschwieler, E.: Yahoo! launches world’s largest hadoop production application (2008). https://developer.yahoo.com/blogs/hadoop/yahoo-launches-world-largest-hadoop-production-application-398.html. Accessed 11 March 2015
Du, X., Ni, Y., Yao, Z., Xiao, R., Xie, D.: High performance parallel evolutionary algorithm model based on mapreduce framework. Int. J. Comput. Appl. Technol. 46(3), 290–295 (2013)
Article Google Scholar
Evans, J., Rzhetsky, A.: Machine science. Science 329, 399–400 (2010)
Article Google Scholar
Funika, W., Godowski, P., Pegiel, P., Król, D.: Semantic-oriented performance monitoring of distributed applications. Comput. Inf. 31(2), 427–446 (2012). http://www.cai.sk/ojs/index.php/cai/article/view/948
Google Scholar
Funika, W., Koperek, P.: Genetic programming in automatic discovery of relationships in computer system monitoring data. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013, Part I. LNCS, vol. 8384, pp. 371–380. Springer, Heidelberg (2014)
Chapter Google Scholar
Funika, W., Koperek, P.: Hubert project source code (2015). https://github.com/pkoperek/hubert. Accessed 15 March 2015
Funika, W., Kupisz, M., Koperek, P.: Towards autonomic semantic-based management of distributed applications. Comput. Sci. (AGH) 11, 51–64 (2010). http://journals.agh.edu.pl/csci/article/view/116
Google Scholar
King, R.D., et al.: The automation of science. Science 324, 85–89 (2009)
Article Google Scholar
Koza, J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992). http://mitpress.mit.edu/books/genetic-programming
MATH Google Scholar
Ryan, A.: Under the hood: Hadoop distributed filesystem reliability with namenode and avatarnode (2012). https://www.facebook.com/notes/facebook-engineering/under-the-hood-hadoop-distributed-filesystem-reliability-with-namenode-and-avata/10150888759153920. Accessed 11 April 2015
Salhi, A., Glaser, H., De Roure, D.: Parallel implementation of a genetic-programming based tool for symbolic regression. Inf. Process. Lett. 66(6), 299–307 (1998). http://dx.doi.org/10.1016/S0020-0190(98)00056-8
Article Google Scholar
Schmidt, M., Lipson, H.: Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009)
Article Google Scholar
Schmidt, M.D., Lipson, H.: Data-mining dynamical systems: automated symbolic system identification for exploratory analysis. In: ASME Conference Proceedings, vol. 2008(48364), pp. 643–649 (2008). http://dx.doi.org/10.1115/esda2008-59309
Schmidt, M., Lipson, H.: Age-fitness pareto optimization. In: Pelikan, M., Branke, J. (eds.) GECCO, pp. 543–544. ACM (2010). http://dblp.uni-trier.de/db/conf/gecco/gecco2010.html#SchmidtL10
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. NSDI 2012, USENIX Association, Berkeley, CA, USA (2012). http://dl.acm.org/citation.cfm?id=2228298.2228301

Download references

Acknowledgement

We would like to thank dr. Maciej Malawski for his valuable help with Amazon EC2 experiments. This research is supported by AGH grant no. 11.11.230.124 as well as by the PLGrid Core project.

Author information

Authors and Affiliations

ACC CYFRONET AGH, AGH, ul. Nawojki 11, 30-950, Kraków, Poland
Włodzimierz Funika
Faculty of Computer Science, Electronics and Telecommunication, Department of Computer Science, AGH, al. Mickiewicza 30, 30-059, Kraków, Poland
Włodzimierz Funika & Paweł Koperek

Authors

Włodzimierz Funika
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Koperek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Włodzimierz Funika .

Editor information

Editors and Affiliations

Czestochowa University of Technolog, Czestochowa, Poland
Roman Wyrzykowski
Department of Computer Science, University of Southern California, Marina Del Rey, California, USA
Ewa Deelman
Electrical Engineering & Comput. Science, University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra
Czestochowa University of Technology, Institute of Computer & Information Sci., Czestochowa, Poland
Konrad Karczewski
Department of Computer Science, AGH University of Science and Technology, Krakow, Poland
Jacek Kitowski
Systèmes d’informations, Big Data et Rec, AGH University of Science and Technology, Krakow, Poland
Kazimierz Wiatr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Funika, W., Koperek, P. (2016). Towards a Scalable Distributed Fitness Evaluation Service. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science(), vol 9573. Springer, Cham. https://doi.org/10.1007/978-3-319-32149-3_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-32149-3_46
Published: 02 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32148-6
Online ISBN: 978-3-319-32149-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics