ABSTRACT
Data-parallel processing of multi-dimensional functional/immutable arrays is characterized by a fundamental trade-off between software engineering principles on the one hand and runtime performance concerns on the other hand. Whereas the former demand code to be written in a generic style abstracting from structural properties of arrays as much as possible, the latter require an optimizing compiler to have as much information on the very same structural properties available at compile time. Asynchronous adaptive specialization of generic code to specific data to be processed at application runtime has proven to be an effective way to reconcile these contrarian demands.
In this paper we revisit asynchronous adaptive specialization in the context of the functional data-parallel array language SaC. We provide a comprehensive analysis of its strengths and weaknesses and propose improvements for its design and implementation. These improvements are primarily concerned with making specializations available to running applications as quickly as possible. We propose four complementary measures to this effect. Bulk adaptive specialization speculatively waits for future specialization requests to materialize instead of addressing each request individually. Prioritized adaptive specialization aims at selecting the most profitable specializations first. Parallel adaptive specialization reserves multiple cores for specialization and, thus, computes multiple specializations simultaneously. Last but not least, persistent adaptive specialization preserves specializations across independent program runs and even across unrelated applications.
- M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeño JVM. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'00), Minneapolis, USA. ACM, 2000. Google ScholarDigital Library
- M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. A survey of adaptive optimization in virtual machines. Proceedings of the IEEE, 93(2), 2005.Google ScholarCross Ref
- J. Aycock. A brief history of just-in-time. ACM Computing Surveys, 35(2):97--113, 2003. Google ScholarDigital Library
- R. Bernecky, S. Herhut, and S.-B. Scholz. Symbiotic Expressions. In M. T. Morazan and S.-B. Scholz, editors, Implementation and Application of Functional Languages, 21st International Symposium, IFL 2009, South Orange, NJ, USA, number 6041 in Lecture Notes in Computer Science, pages 107--126. Springer, 2011. Google ScholarDigital Library
- M. Diogo and C. Grelck. Towards heterogeneous computing without heterogeneous programming. In K. Hammond and H. Loidl, editors, Trends in Functional Programming, 13th Symposium, TFP 2012, St.Andrews, UK, volume 7829 of Lecture Notes in Computer Science, pages 279--294. Springer, 2013.Google Scholar
- A. Falkoff and K. Iverson. The Design of APL. IBM Journal of Research and Development, 17(4):324--334, 1973. Google ScholarDigital Library
- C. Grelck. Implicit Shared Memory Multiprocessor Support for the Functional Programming Language SAC --- Single Assignment C. PhD thesis, Institute of Computer Science and Applied Mathematics, University of Kiel, Germany, 2001. Logos Verlag, Berlin, 2001.Google Scholar
- C. Grelck. Shared memory multiprocessor support for functional array processing in SAC. Journal of Functional Programming, 15(3):353--401, 2005. Google ScholarDigital Library
- C. Grelck. Single Assignment C (SAC): high productivity meets high performance. In V. Zsók, Z. Horváth, and R. Plasmeijer, editors, 4th Central European Functional Programming Summer School (CEFP'11), Budapest, Hungary, volume 7241 of Lecture Notes in Computer Science, pages 207--278. Springer, 2012. Google ScholarDigital Library
- C. Grelck and S.-B. Scholz. Classes and Objects as Basis for I/O in SAC. In T. Johnsson, editor, 7th International Workshop on Implementation of Functional Languages (IFL'95), Båstad, Sweden, pages 30--44. Chalmers University of Technology, Gothenburg, Sweden, 1995.Google Scholar
- C. Grelck and S.-B. Scholz. Axis Control in SAC. In R. Peña and T. Arts, editors, Implementation of Functional Languages, 14th International Workshop (IFL'02), Madrid, Spain, Revised Selected Papers, volume 2670 of Lecture Notes in Computer Science, pages 182--198. Springer, 2003. Google ScholarDigital Library
- C. Grelck and S.-B. Scholz. SAC --- From High-level Programming with Arrays to Efficient Parallel Execution. Parallel Processing Letters, 13(3):401--412, 2003.Google ScholarCross Ref
- C. Grelck and S.-B. Scholz. SAC: A functional array language for efficient multithreaded execution. International Journal of Parallel Programming, 34(4):383--427, 2006. Google ScholarDigital Library
- C. Grelck and S.-B. Scholz. Merging compositions of array skeletons in SAC. Journal of Parallel Computing, 32(7+8):507--522, 2006. Google ScholarDigital Library
- C. Grelck and S.-B. Scholz. SAC: Off-the-Shelf Support for Data-Parallelism on Multicores. In N. Glew and G. Blelloch, editors, 2nd Workshop on Declarative Aspects of Multicore Programming (DAMP'07), Nice, France, pages 25--33. ACM Press, 2007. Google ScholarDigital Library
- C. Grelck, T. van Deurzen, S. Herhut, and S.-B. Scholz. An Adaptive Compilation Framework for Generic Data-Parallel Array Programming. In 15th Workshop on Compilers for Parallel Computing (CPC'10). Vienna University of Technology, Vienna, Austria, 2010.Google Scholar
- C. Grelck, T. van Deurzen, S. Herhut, and S.-B. Scholz. Asynchronous Adaptive Optimisation for Generic Data-Parallel Array Programming. Concurrency and Computation: Practice and Experience, 24(5):499--516, 2012. Google ScholarDigital Library
- J. Guo, J. Thiyagalingam, and S.-B. Scholz. Breaking the gpu programming barrier with the auto-parallelising SAC compiler. In 6th Workshop on Declarative Aspects of Multicore Programming (DAMP'11), Austin, USA, pages 15--24. ACM Press, 2011. Google ScholarDigital Library
- G. Hansen. Adaptive Systems for the Dynamic Run-Time Optimization of Programs. PhD thesis, Carnegie-Mellon University, Pittsburgh, USA, 1974. Google ScholarDigital Library
- International Standards Organization. Programming Language APL, Extended. ISO N93.03, ISO, 1993.Google Scholar
- K. Iverson. Programming in J. Iverson Software Inc., Toronto, Canada, 1991.Google Scholar
- M. Jenkins. Q'Nial: A Portable Interpreter for the Nested Interactive Array Language Nial. Software Practice and Experience, 19(2):111--126, 1989. Google ScholarDigital Library
- M. Jenkins and J. Glasgow. A Logical Basis for Nested Array Data Structures. Computer Languages Journal, 14(1):35--51, 1989. Google ScholarDigital Library
- J. Kim, W.-C. Hsu, and P.-C. Yew. Cobra: An adaptive runtime binary optimization framework for multithreaded applications. In International Conference on Parallel Processing (ICPP 2007), 2007. Google ScholarDigital Library
- D. Kreye. A Compilation Scheme for a Hierarchy of Array Types. In T. Arts and M. Mohnen, editors, Implementation of Functional Languages, 13th International Workshop (IFL'01), Stockholm, Sweden, Selected Papers, volume 2312 of Lecture Notes in Computer Science, pages 18--35. Springer, 2002. Google ScholarDigital Library
- J. Lu, H. Chen, R. Fu, W.-C. Hsu, B. Othmer, P.-C. Yew, and D.-Y. Chen. The performance of runtime data cache prefetching in a dynamic optimization system. In 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36), San Diego, USA. IEEE, 2003. Google ScholarDigital Library
- S.-B. Scholz. With-loop-folding in SAC --- Condensing Consecutive Array Operations. In C. Clack, T. Davie, and K. Hammond, editors, Implementation of Functional Languages, 9th International Workshop (IFL'97), St. Andrews, UK, Selected Papers, volume 1467 of Lecture Notes in Computer Science, pages 72--92. Springer, 1998. Google ScholarDigital Library
- K. Streit, C. Hammacher, A. Zeller, and S. Hack. Sambamba: A run-time system for online adaptive parallelization. In M. O'Boyle, editor, 21st International Conference on Compiler Construction (CC'12), Tallinn, Estonia, volume 7210 of Lecture Notes in Computer Science, pages 240--243. Springer, 2012. Google ScholarDigital Library
- M. Voss and R. Eigenmann. A framework for remote dynamic program optimization. In ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization (DYNAMO'00), Boston, USA, pages 32--40. ACM, 2000. Google ScholarDigital Library
- M. Voss and R. Eigenmann. High-level adaptive program optimization with ADAPT. In ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'01), Snowbird, USA, pages 93--102. ACM, 2001. Google ScholarDigital Library
- A. Šinkarovs, S. Scholz, R. Bernecky, R. Douma, and C. Grelck. SAC/C formulations of the all-pairs N-body problem and their performance on SMPs and GPGPUs. Concurrency and Computation: Practice and Experience, 26(4):952--971, 2014. DOI: 10.1002/cpe.3078. Google ScholarDigital Library
Index Terms
- Next Generation Asynchronous Adaptive Specialization for Data-Parallel Functional Array Processing in SAC: Accelerating the Availability of Specialized High Performance Code
Recommendations
Persistent Asynchronous Adaptive Specialization for Generic Array Programming
Generic array programming systematically abstracts from structural array properties such as shape and rank. As usual, generic programming comes at the price of lower runtime performance. The idea of asynchronous adaptive specialization is to exploit ...
Towards Compiling SAC for the Xeon Phi Knights Corner and Knights Landing Architectures: Strategies and Experiments
IFL '17: Proceedings of the 29th Symposium on the Implementation and Application of Functional Programming LanguagesXeon Phi is the common brand name of Intel's Many Integrated Core (MIC) architecture. The first commercially available generation Knights Corner and the second generation Knights Landing form a middle ground between modestly parallel desktop and ...
Adaptive beamformer derived from a constrained null steering design
In the environment where the desired signal is stronger or not significantly weaker than the interferences, nulling in the direction of interest becomes a problem for conventional zero tracking algorithm. The null steering algorithms with single and ...
Comments