Abstract
Optimizing parallel programs is a complex task because the interference among many different parameters. Work-stealing runtimes, used to dynamically balance load among different processor cores, are no exception. This work explores the automatic configuration of the following runtime parameters: dynamic granularity control algorithms, granularity control cache, work-stealing algorithm, lazy binary splitting parameter, the maximum queue size and the unparking interval. The performance of the program is highly sensible to the granularity control algorithm, which can be a combination of other granularity algorithms. In this work, we address two search-based problems: finding a globally efficient work-stealing configuration, and finding the best configuration just for an individual program. For both problems, we propose the use of a Genetic Algorithm (GA). The genotype of the GA is able to represent combinations of up to three cut-off algorithms, as well as other work-stealing parameters.
The proposed GA has been evaluated in its ability to obtain a more efficient solution across a set of programs, in its ability to generalize the solution to a larger set of programs, and its ability to evolve single programs individually.
The GA was able to improve the performance of the set of programs in the training set, but the obtained configurations were not generalized to a larger benchmark set. However, it was able to successfully improve the performance of each program individually.
This is a preview of subscription content, log in via an institution.
References
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Dagum, L., Menon, R.: Openmp: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Lea, D.: A java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, pp. 36–43. ACM (2000)
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: ACM Sigplan Notices, vol. 40, pp. 519–538. ACM (2005)
Stork, S., Naden, K., Sunshine, J., Mohr, M., Fonseca, A., Marques, P., Aldrich, J.: Æminium: a permission-based concurrent-by-default programming language approach. ACM Trans. Program. Lang. Syst. (TOPLAS) 36(1), 2 (2014)
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: ACM Sigplan Notices, vol. 33, pp. 212–223. ACM (1998)
Mohr, E., Kranz, D.A., Halstead, R.H.: Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Trans. Parallel Distrib. Syst. 2(3), 264–280 (1991)
Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Eigenmann, R., Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008). doi:10.1007/978-3-540-79561-2_9
Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 36. IEEE Press (2008)
Fonseca, A., Cabral, B.: Evaluation of runtime cut-off approaches for parallel programs. In: VECPAR 2016 Proceedings (2016)
Miller, B.L., Goldberg, D.E.: Genetic algorithms, tournament selection, and the effects of noise. Complex Syst. 9(3), 193–212 (1995)
DeJong, K.: An analysis of the behavior of a class of genetic adaptive systems. Ph.D. Thesis, University of Michigan (1975)
Olivier, S.L., Prins, J.F.: Evaluating OpenMP 3.0 run time systems on unbalanced task graphs. In: Müller, M.S., Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 63–78. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02303-3_6
Tchiboukdjian, M., Danjean, V., Gautier, T., Mentec, F., Raffin, B.: A work stealing scheduler for parallel loops on shared cache multicores. In: Guarracino, M.R., et al. (eds.) Euro-Par 2010. LNCS, vol. 6586, pp. 99–107. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21878-1_13
Cong, G., Kodali, S., Krishnamoorthy, S., Lea, D., Saraswat, V., Wen, T.: Solving large, irregular graph problems using adaptive work-stealing. In: 2008 37th International Conference on Parallel Processing, pp. 536–545. IEEE (2008)
Wang, L., Cui, H., Duan, Y., Lu, F., Feng, X., Yew, P.C.: An adaptive task creation strategy for work-stealing scheduling. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 266–277. ACM (2010)
Chen, S., Gibbons, P.B., Kozuch, M., Liaskovitis, V., Ailamaki, A., Blelloch, G.E., Falsafi, B., Fix, L., Hardavellas, N., Mowry, T.C., et al.: Scheduling threads for constructive cache sharing on cmps. In: Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 105–115. ACM (2007)
Ahmad, I., Dhodhi, M.K.: Multiprocessor scheduling in a genetic paradigm. Parallel Comput. 22(3), 395–406 (1996)
Kwok, Y.K., Ahmad, I.: Efficient scheduling of arbitrary task graphs to multiprocessors using a parallel genetic algorithm. J. Parallel Distrib. Comput. 47(1), 58–77 (1997)
Wang, L., Siegel, H.J., Roychowdhury, V.P., Maciejewski, A.A.: Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach. J. Parallel Distrib. Comput. 47(1), 8–22 (1997)
Corrêa, R.C., Ferreira, A., Rebreyend, P.: Scheduling multiprocessor tasks with genetic algorithms. IEEE Trans. Parallel Distrib. Syst. 10(8), 825–837 (1999)
Omara, F.A., Arafa, M.M.: Genetic algorithms for task scheduling problem. J. Parallel Distrib. Comput. 70(1), 13–22 (2010)
Mezmaz, M., Melab, N., Kessaci, Y., Lee, Y.C., Talbi, E.G., Zomaya, A.Y., Tuyttens, D.: A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems. J. Parallel Distrib. Comput. 71(11), 1497–1508 (2011)
Sheikh, H.F., Ahmad, I., Fan, D.: An evolutionary technique for performance-energy-temperature optimized scheduling of parallel tasks on multi-core processors. IEEE Trans. Parallel Distrib. Syst. 27(3), 668–681 (2016)
Langdon, W.B., Harman, M.: Genetically improved CUDA C++ software. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., García-Sánchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 87–99. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44303-3_8
Le Goues, C., Nguyen, T., Forrest, S., Weimer, W.: Genprog: a generic method for automatic software repair. IEEE Trans. Software Eng. 38(1), 54–72 (2012)
Ryan, C., Ivan, L., Koza, J.R., Banzhaf, W.: Automatic parallelization of loops in sequential programs using genetic programming. In: Genetic Programming 1998: Proceedings of the Third, pp. 344–349. Morgan Kaufmann (1998)
Ryan, C., Ivan, L.: Automatic parallelization of arbitrary programs. In: Poli, R., Nordin, P., Langdon, W.B., Fogarty, T.C. (eds.) EuroGP 1999. LNCS, vol. 1598, pp. 244–254. Springer, Heidelberg (1999). doi:10.1007/3-540-48885-5_21
Acknowledgments
The first author was supported by the Portuguese National Foundation for Science and Technology (FCT) through a Doctoral Grant (SFRH/BD/84448/2012).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Fonseca, A., Lourenço, N., Cabral, B. (2017). Evolving Cut-Off Mechanisms and Other Work-Stealing Parameters for Parallel Programs. In: Squillero, G., Sim, K. (eds) Applications of Evolutionary Computation. EvoApplications 2017. Lecture Notes in Computer Science(), vol 10199. Springer, Cham. https://doi.org/10.1007/978-3-319-55849-3_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-55849-3_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55848-6
Online ISBN: 978-3-319-55849-3
eBook Packages: Computer ScienceComputer Science (R0)