ABSTRACT
Optimized software execution on parallel computing systems demands consideration of many parameters at run-time. Determining the optimal set of parameters in a given execution context is a complex task, and therefore to address this issue researchers have proposed different approaches that use heuristic search or machine learning. In this paper, we undertake a systematic literature review to aggregate, analyze and classify the existing software optimization methods for parallel computing systems. We review approaches that use machine learning or meta-heuristics for scheduling parallel computing systems. Additionally, we discuss challenges and future research directions. The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of scheduling parallel computing systems. Furthermore, it may aid in understanding the limitations of existing approaches and identification of areas for improvement.
- Ishfaq Ahmad, YK Kwok, Imtiaz Ahmad, and Muhammad Dhodhi. 2001. Scheduling parallel programs using genetic algorithms. Solutions to Parallel and Distributed Computing Problems. New York, USA: John Wiley and Sons, Chapt 9 (2001), 231--254.Google Scholar
- Omer Erdil Albayrak, Ismail Akturk, and Ozcan Ozturk. 2013. Improving Application Behavior on Heterogeneous Manycore Systems Through Kernel Mapping. Parallel Comput. 39, 12 (Dec. 2013), 867--878. Google ScholarDigital Library
- Blaise Barney et al. 2010. Introduction to parallel computing. Lawrence Livermore National Laboratory 6, 13 (2010), 10.Google Scholar
- Siegfried Benkner, Sabri Pllana, Jesper Larsson Traff, Philippas Tsigas, Uwe Dolinsky, Cedric Augonnet, Beverly Bachmayer, Christoph Kessler, David Moloney, and Vitaly Osipov. 2011. PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems. IEEE Micro 31, 5 (Sept 2011), 28--41. Google ScholarDigital Library
- Alécio Pedro Delazari Binotto, Marco Aurélio Wehrmeister, Arjan Kuijper, and Carlos Eduardo Pereira. 2013. Sm@rtConfig: A context-aware runtime and tuning system using an aspect-oriented approach for data intensive engineering applications. Control Engineering Practice 21, 2 (2013), 204--217.Google ScholarCross Ref
- Javier Carretero, Fatos Xhafa, and Ajith Abraham. 2007. Genetic algorithm based schedulers for grid computing systems. International Journal of Innovative Computing, Information and Control 3, 6 (2007), 1--19.Google Scholar
- Márcio Castro, Luís Fabrício Wanderley Góes, Luiz Gustavo Fernandes, and Jean-François Méhaut. 2012. Dynamic thread mapping based on machine learning for transactional memory applications. In Euro-Par 2012 Parallel Processing. Springer, 465--476. Google ScholarDigital Library
- Marcio Castro, Luis Fabricio Wanderley Goes, Christiane Pousa Ribeiro, Murray Cole, Marcelo Cintra, and Jean-Francois Mehaut. 2011. A machine learning-based approach for thread mapping on transactional memory applications. In High Performance Computing (HiPC), 2011 18th International Conference on. IEEE, 1--10. Google ScholarDigital Library
- John Cavazos, Grigori Fursin, Felix Agakov, Edwin Bonilla, Michael FPO Boyle, and Olivier Temam. 2007. Rapidly selecting good compiler optimizations using performance counters. In Code Generation and Optimization, 2007. CGO'07. International Symposium on. IEEE, 185--197. Google ScholarDigital Library
- Keith D Cooper, Alexander Grosul, Timothy J Harvey, Steven Reeves, Devika Subramanian, Linda Torczon, and Todd Waterman. 2005. ACME: adaptive compilation made efficient. In ACM SIGPLAN Notices, Vol. 40. ACM, 69--77. Google ScholarDigital Library
- Julita Corbalan, Xavier Martorell, and Jesus Labarta. 2005. Performance-driven processor allocation. Parallel and Distributed Systems, IEEE Transactions on 16, 7 (2005), 599--611. Google ScholarDigital Library
- Gregory F. Diamos and Sudhakar Yalamanchili. 2008. Harmony: An Execution Model and Runtime for Heterogeneous Many Core Systems. In Proceedings of the 17th International Symposium on High Performance Distributed Computing (HPDC '08). ACM, New York, NY, USA, 197--200. Google ScholarDigital Library
- Murali Krishna Emani, Zheng Wang, and Michael FP O'Boyle. 2013. Smart, adaptive mapping of parallelism in the presence of external workload. In International Symposium on Code Generation and Optimization (CGO). IEEE, 1--10. Google ScholarDigital Library
- Eric Gaussier, David Glesser, Valentin Reis, and Denis Trystram. 2015. Improving backfilling by using machine learning to predict running times. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 64. Google ScholarDigital Library
- Dominik Grewe and Michael FP O'Boyle. 2011. Astatic task partitioning approach for heterogeneous systems using OpenCL. In Compiler Construction. Springer, 286--305. Google ScholarDigital Library
- Dominik Grewe, Zheng Wang, and Michael FP O'Boyle. 2011. A workload-aware mapping approach for data-parallel programs. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM, 117--126. Google ScholarDigital Library
- Daniel Grzonka, Agnieszka Jakobik, Joanna Kolodziej, and Sabri Pllana. 2017. Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance and security. Future Generation Computer Systems (2017).Google Scholar
- Roman Iakymchuk, Herbert Jordan, Ivy Bo Peng, Stefano Markidis, and Erwin Laure. 2016. A Particle-in-Cell Method for Automatic Load-Balancing with the AllScale Environment. In The Exascale Applications & Software Conference (EASC2016).Google Scholar
- Jim Jeffers and James Reinders. 2015. High Performance Parallelism Pearls Volume Two: Multicore and Many-core Programming Approaches. Morgan Kaufmann. Google ScholarDigital Library
- Christoph Kessler and Welf Löwe. 2012. Optimized composition of performance-aware parallel components. Concurrency and Computation: Practice and Experience 24, 5 (2012), 481--498. Google ScholarDigital Library
- Barbara Kitchenham and Stuart Charters. 2007. Guidelines for performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007-001. Keele University and Durham University Joint Report.Google Scholar
- Byoung-Dai Lee and Jennifer M Schopf. 2003. Run-time prediction of parallel applications on shared environments. In Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on. IEEE, 487--491.Google Scholar
- Lu Li, Usman Dastgeer, and Christoph Kessler. 2012. Adaptive off-line tuning for optimized composition of components for heterogeneous many-core systems. In High Performance Computing for Computational Science-VECPAR 2012. Springer, 329--345.Google Scholar
- Min Li, Liangzhao Zeng, Shicong Meng, Jian Tan, Li Zhang, Ali R Butt, and Nicholas Fuller. 2014. MRONLINE: MapReduce online performance tuning. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing. ACM, 165--176. Google ScholarDigital Library
- Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, USA, 45--55. Google ScholarDigital Library
- Maciej Malawski, Gideon Juve, Ewa Deelman, and Jarek Nabrzyski. 2015. Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds. Future Generation Computer Systems 48 (2015), 1--18. Special Section: Business and Industry Specific Cloud. Google ScholarDigital Library
- Kiran Mantripragada, Alecio Pedro Delazari Binotto, and Leonardo P. Tizzei. 2014. A Self-adaptive Auto-scaling Method for Scientific Applications on HPC Environments and Clouds. CoRR abs/1412.6392 (2014).Google Scholar
- T. Mastelic, W. Fdhila, I. Brandic, and S. Rinderle-Ma. 2015. Predicting Resource Allocation and Costs for Business Processes in the Cloud. In 2015 IEEE World Congress on Services. 47--54. Google ScholarDigital Library
- Suejb Memeti, Lu Li, Sabri Pllana, Joanna Kolodziej, and Christoph Kessler. 2017. Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption. In Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing (ARMS-CC '17). ACM, New York, NY, USA, 1--6. Google ScholarDigital Library
- Suejb Memeti and Sabri Pllana. 2016. Combinatorial optimization of DNA sequence analysis on heterogeneous systems. Concurrency and Computation: Practice and Experience (2016), n/a--n/a.Google Scholar
- Suejb Memeti and Sabri Pllana. 2016. A machine learning approach for accelerating DNA sequence analysis. The International Journal of High Performance Computing Applications 0, 0 (2016), 1094342016654214.Google Scholar
- Suejb Memeti, Sabri Pllana, and Joanna Kołodziej. 2016. Optimal Worksharing of DNA Sequence Analysis on Accelerated Platforms. Springer International Publishing, Cham, 279--309.Google Scholar
- William F Ogilvie, Pavlos Petoumenous, Zheng Wang, and Hugh Leather. 2015. CGO: G: Intelligent Heuristic Construction with Active Learning. (2015).Google Scholar
- David Padua. 2011. Encyclopedia of Parallel Computing. Springer Publishing Company, Incorporated. Google ScholarDigital Library
- Andrew J. Page and Thomas J. Naughton. 2005. Framework for Task Scheduling in Heterogeneous Distributed Computing Using Genetic Algorithms. Artificial Intelligence Review 24, 3 (2005), 415--429. Google ScholarDigital Library
- Yong-won Park, S Baskiyar, and K Casey. 2010. A Novel Adaptive Support Vector Machine based Task Scheduling. In Proceedings the 9th International Conference on Parallel and Distributed Computing and Networks, Austria. 16--18.Google Scholar
- William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 2007. Numerical Recipes 3rd Edition: The Art of Scientific Computing (3 ed.). Cambridge University Press. Google ScholarDigital Library
- Vignesh T Ravi and Gagan Agrawal. 2011. A dynamic scheduling framework for emerging heterogeneous systems. In High Performance Computing (HiPC), 2011 18th International Conference on. IEEE, 1--10. Google ScholarDigital Library
- Martin Sandrieser, Siegfried Benkner, and Sabri Pllana. 2012. Using Explicit Platform Descriptions to Support Programming of Heterogeneous Many-Core Systems. Parallel Comput. 38, 1--2 (01 2012), 52--56. Google ScholarDigital Library
- S. N. Sivanandam and P. Visalakshi. 2009. Dynamic Task Scheduling with Load Balancing Using Parallel Orthogonal Particle Swarm Optimisation. Int. J. Bio-Inspired Comput. 1, 4 (April 2009), 276--286. Google ScholarDigital Library
- Nathan Thomas, Gabriel Tanase, Olga Tkachyshyn, Jack Perdue, Nancy M Amato, and Lawrence Rauchwerger. 2005. A framework for adaptive algorithm selection in STAPL. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM, 277--288. Google ScholarDigital Library
- Andre Viebke and Sabri Pllana. 2015. The Potential of the Intel (R) Xeon Phi for Supervised Deep Learning. In 2015 IEEE 17th International Conference on High Performance Computing and Communications. 758--765. Google ScholarDigital Library
- Zheng Wang and Michael FP O'Boyle. 2009. Mapping parallelism to multi-cores: a machine learning based approach. In ACM Sigplan notices, Vol. 44. ACM, 75--84. Google ScholarDigital Library
- Yun Zhang, Michael Voss, and ES Rogers. 2005. Runtime empirical selection of loop schedulers on hyperthreaded smps. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International. IEEE, 44b--44b. Google ScholarDigital Library
- Albert Y Zomaya and Yee-Hwei Teh. 2001. Observations on using genetic algorithms for dynamic load-balancing. Parallel and Distributed Systems, IEEE Transactions on 12, 9 (2001), 899--911. Google ScholarDigital Library
Index Terms
- A Review of Machine Learning and Meta-heuristic Methods for Scheduling Parallel Computing Systems
Recommendations
An effective iterated greedy algorithm for scheduling unrelated parallel batch machines with non-identical capacities and unequal ready times
We consider the problem of scheduling jobs on unrelated batch machines so as to minimize the makespan.We present a MIP formulation of the problem and present a lower bound on the optimal makespan.We propose an iterated greedy algorithm to solve the ...
Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review
While modern parallel computing systems offer high performance, utilizing these powerful computing resources to the highest possible extent demands advanced knowledge of various hardware architectures and parallel programming models. Furthermore, ...
Iterated Local Search Based Heuristic for Scheduling Jobs on Unrelated Parallel Machines with Machine Deterioration Effect
GECCO '16 Companion: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference CompanionIn this research, we study an unrelated parallel machine scheduling problem in which the jobs cause deterioration of the machines. This deterioration decreases the performance of the machines, therefore the processing times of the jobs are increased ...
Comments