ABSTRACT
Program autotuning is becoming an increasingly valuable tool for improving performance portability across diverse target architectures, exploring trade-offs between several criteria, or meeting quality of service requirements. Recent work on general autotuning frameworks enabled rapid development of domain-specific autotuners reusing common libraries of parameter types and search techniques. In this work we explore the use of such frameworks to develop general-purpose online services for program autotuning using the Software as a Service model. Beyond the common benefits of this model, the proposed approach opens up a number of unique opportunities, such as collecting performance data and utilizing it to improve further runs, or enabling remote online autotuning. However, the proposed autotuning as a service approach also brings in several challenges, such as accessing target systems, dealing with measurement latency, and supporting execution of user-provided code. This paper presents the first step towards implementing the proposed approach and addressing these challenges. We describe an implementation of generic autotuning service that can be used for tuning arbitrary programs on user-provided computing systems. The service is based on OpenTuner autotuning framework and runs on Everest platform that enables rapid development of computational web services. In contrast to OpenTuner, the service doesn't require installation of the framework, allows users to avoid writing code and supports efficient parallel execution of measurement tasks across multiple machines. The performance of the service is evaluated by using it for tuning synthetic and real programs.
- Everest. {online}. http://everest.distcomp.org/.Google Scholar
- A. Afanasiev, O. Sukhoroslov, and V. Voloshinov. MathCloud: Publication and reuse of scientific applications as restful web services. In Parallel Computing Technologies, pages 394--408. Springer, 2013. Google ScholarDigital Library
- J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-Kelley, J. Bosboom, U.-M. O'Reilly, and S. Amarasinghe. Opentuner: an extensible framework for program autotuning. In Proceedings of the 23rd international conference on Parallel architectures and compilation, pages 303--316. ACM, 2014. Google ScholarDigital Library
- J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on gpus. In ACM Sigplan Notices, volume 45, pages 115--126. ACM, 2010. Google ScholarDigital Library
- M. Christen, O. Schenk, and H. Burkhart. Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International, pages 676--687. IEEE, 2011. Google ScholarDigital Library
- T. Delaitre, T. Kiss, A. Goyeneche, G. Terstyanszky, S. Winter, and P. Kacsuk. GEMLCA: Running legacy code applications as grid services. Journal of Grid Computing, 3(1-2):75--90, 2005.Google ScholarCross Ref
- J. J. Dongarra, P. Luszczek, and A. Petitet. The linpack benchmark: past, present and future. Concurrency and Computation: practice and experience, 15(9):803--820, 2003.Google Scholar
- M. Frigo and S. G. Johnson. The design and implementation of fftw3. Proceedings of the IEEE, 93(2):216--231, 2005.Google ScholarCross Ref
- G. Fursin, A. Lokhmotov, and E. Plowman. Collective knowledge: towards r&d sustainability. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 864--869. IEEE, 2016. Google ScholarDigital Library
- G. Fursin, R. Miceli, A. Lokhmotov, M. Gerndt, M. Baboulin, A. D. Malony, Z. Chamski, D. Novillo, and D. Del Vento. Collective mind: Towards practical and collaborative auto-tuning. Scientific Programming, 22(4):309--329, 2014.Google ScholarDigital Library
- S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams. An auto-tuning framework for parallel multicore stencil computations. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--12. IEEE, 2010.Google ScholarCross Ref
- S. Krishnan, L. Clementi, J. Ren, P. Papadopoulos, and W. Li. Design and evaluation of opal2: A toolkit for scientific software as a service. In Services-I, 2009 World Conference on, pages 709--716. IEEE, 2009. Google ScholarDigital Library
- X. Li, M. J. Garzarán, and D. Padua. A dynamically tuned sorting library. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on, pages 111--122. IEEE, 2004. Google ScholarDigital Library
- X. Li, M. J. Garzaran, and D. Padua. Optimizing sorting with genetic algorithms. In Proceedings of the international symposium on Code generation and optimization, pages 99--110. IEEE Computer Society, 2005. Google ScholarDigital Library
- T. Lutz, C. Fensch, and M. Cole. Partans: An autotuning framework for stencil computation on multi-gpu systems. ACM Transactions on Architecture and Code Optimization (TACO), 9(4):59, 2013. Google ScholarDigital Library
- M. Olszewski and M. Voss. Install-time system for automatic generation of optimized parallel sorting algorithms. In PDPTA, pages 17--23. Citeseer, 2004.Google Scholar
- M. Püschel, J. M. Moura, B. Singer, J. Xiong, J. Johnson, D. Padua, M. Veloso, and R. W. Johnson. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. International Journal of High Performance Computing Applications, 18(1):21--45, 2004. Google ScholarDigital Library
- L. Richardson and S. Ruby. RESTful web services. "O'Reilly Media, Inc.", 2008. Google ScholarDigital Library
- O. Sukhoroslov, S. Volkov, and A. Afanasiev. A web-based platform for publication and distributed execution of computing applications. In Parallel and Distributed Computing (ISPDC), 2015 14th International Symposium on, pages 175--184, June 2015. Google ScholarDigital Library
- S. Volkov and O. Sukhoroslov. A generic web service for running parameter sweep experiments in distributed computing environment. Procedia Computer Science, 66:477--486, 2015.Google ScholarCross Ref
- R. Vuduc, J. W. Demmel, and K. A. Yelick. Oski: A library of automatically tuned sparse matrix kernels. In Journal of Physics: Conference Series, volume 16, page 521. IOP Publishing, 2005.Google Scholar
- R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing, pages 1--27. IEEE Computer Society, 1998. Google ScholarDigital Library
- K. Wu. DeepTuner: A System for Search Technique Recommendation in Program Autotuning. PhD thesis, Massachusetts Institute of Technology, 2015.Google Scholar
Index Terms
- Program autotuning as a service: opportunities and challenges
Recommendations
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
AbstractGPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Autotuning OpenACC work distribution via direct search
XSEDE '15: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced CyberinfrastructureOpenACC provides a high-productivity API for programming GPUs and similar accelerator devices. One of the last steps in tuning OpenACC programs is selecting values for the num_gangs and vector_length clauses, which control how a parallel workload is ...
Comments