Abstract
Separating algorithms from their computation schedule has become a de facto solution to tackle the challenges of developing high performance code on modern heterogeneous architectures. Common approaches include Domain-specific languages (DSLs) which provide familiar APIs to domain experts, code generation frameworks that automate the generation of fast and portable code, and runtime systems that manage threads for concurrency and parallelism. In this paper, we present the Halide code generation framework for Phylanx distributed array processing platform. This extension enables compile-time optimization of Phylanx primitives for target architectures. To accomplish this, (1) we implemented new Phylanx primitives using Halide, and (2) partially exported Halide’s thread pool API to carry out parallelism on HPX (Phylanx’s runtime) threads. (3) showcased HPX performance analysis tools made available to Halide applications. The evaluation of the work has been done in two steps. First, we compare the performance of Halide applications running on its native runtime with that of the new HPX backend to verify there is no cost associated with using HPX threads. Next, we compare performances of a number of original implementations of Phylanx primitives against the new ones in Halide to verify performance and portability benefits of Halide in the context of Phylanx.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Schaller, R.R.: Moore’s law: past, present and future. IEEE Spectr. 34(6), 52–59 (1997)
Dennard, R.H., Gaensslen, F.H., Yu, H.-N., Rideout, V.L., Bassous, E., LeBlanc, A.R.: Design of ion-implanted mosfet’s with very small physical dimensions. IEEE J. Solid-State Circ. 9(5), 256–268 (1974)
Sujeeth, A.K., et al.: Delite: a compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst. (TECS) 13(4s), 1–25 (2014)
Kelker, R.D.: Clojure for Domain-Specific Languages. Packt Publishing Ltd. (2013)
Lee, H.J., et al.: Implementing domain-specific languages for heterogeneous parallel computing. IEEE Micro 31(5), 42–53 (2011)
Brown, K.J., et al.: A heterogeneous parallel framework for domain-specific languages. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp. 89–100. IEEE (2011)
Chafi, H., et al.: Language virtualization for heterogeneous parallel computing. ACM SIGPLAN Not. 45(10), 835–847 (2010)
Sujeeth, A.K., et al.: Optiml: an implicitly parallel domain-specific language for machine learning. In: ICML (2011)
Gysi, T., Osuna, C., Fuhrer, O., Bianco, M., Schulthess, T.C.: Stella: a domain-specific tool for structured grid methods in weather and climate models. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2015)
Osuna, C., Wicky, T., Thuering, F., Hoefler, T., Fuhrer, O.: Dawn: a high-level domain-specific language compiler toolchain for weather and climate applications. Supercomputi. Front. Innov. 7(2), 79–97 (2020)
Osuna, C., Thuering, F., Wicky, T., Dahm, J., et al.: Meteoswiss-apn/dawn: 0.0. 2 (2020)
Chen, C., Chame, J., Hall, M.: Chill: a framework for composing high-level loop transformations. Technical report, Citeseer (2008)
Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–12. IEEE (2009)
Tapus, C., Chung, I.H., Hollingsworth, J.K.: Active harmony: towards automated performance tuning. In: SC 2002: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pp. 44–44. IEEE (2002)
Baghdadi, R., et al.: Tiramisu: a polyhedral compiler for expressing fast and portable code. In: 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 193–205. IEEE (2019)
Kale, L.V., Krishnan, S.: Charm++ a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 91–108 (1993)
Kale, L.V., Bhatele, A.: Parallel Science and Engineering Applications: The Charm++ Approach. CRC Press (2019)
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC 2012: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE (2012)
Kaiser, H., et al.: HPX-the C++ standard library for parallelism and concurrency. J. Open Source Softw. 5(53), 2352 (2020)
Wagle, B., Kellar, S., Serio, A., Kaiser, H.: Methodology for adaptive active message coalescing in task based runtime systems. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1133–1140. IEEE (2018)
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Huck, K.A., et al.: An autonomic performance environment for exascale. Supercomput. Front. Innov. 2(3), 49–66 (2015)
Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: a portable interface to hardware performance counters. In: Proceedings of the Department of Defense HPCMP Users Group Conference, vol. 710. Citeseer (1999)
Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open trace format 2: the next generation of scalable trace formats and support libraries. In: Applications, Tools and Techniques on the Road to Exascale Computing, pp. 481–490. IOS Press (2012)
Traveler-integrated (2021). https://github.com/hdc-arizona/traveler-integrated
Tohid, R., et al.: Asynchronous execution of python code on task-based runtime systems. In: 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), pp. 37–45. IEEE (2018)
Hasheminezhad, B., Shirzad, S., Wu, N., Diehl, P., Schulz, H., Kaiser, H.: Towards a scalable and distributed infrastructure for deep learning applications. In: 2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS), pp. 20–30. IEEE (2020)
Brandt, S.R., et al.: Distributed asynchronous array computing with the jetlag environment. In: 2020 IEEE/ACM 9th Workshop on Python for High-Performance and Scientific Computing (PyHPC), pp. 49–57. IEEE (2020)
Gupta, N., et al.: Deploying a task-based runtime system on raspberry pi clusters. In: 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), pp. 11–20. IEEE (2020)
Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: scientific containers for mobility of compute. PLoS ONE 12(5), e0177459 (2017)
Merkel, D., et al.: Docker: lightweight linux containers for consistent development and deployment. Linux J. 2014(239), 2 (2014)
Dooley, R., Brandt, S.R., Fonner, J.: The agave platform: an open, science-as-a-service platform for digital science. In: Proceedings of the Practice and Experience on Advanced Research Computing, pp. 1–8 (2018)
Brandt, S.R., et al.: Jetlag: an interactive, asynchronous array computing environment. In: Practice and Experience in Advanced Research Computing, pp. 8–12 (2020)
Blaze. https://bitbucket.org/blaze-lib/blaze/. Accessed 10 Sept 2021
Blaze tensor (2021). https://github.com/STEllAR-GROUP/blaze_tensor/
Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The numpy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2), 22–30 (2011)
Sakin, S.A., et al.: Traveler: navigating task parallel traces for performance analysis. arXiv e-prints, pages arXiv-2208 (2022)
Rostam cluster, ste\(||\)ar group at cct (2021). https://wiki.rostam.cct.lsu.edu/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tohid, R., Shirzad, S., Taylor, C., Sakin, S.A., Isaacs, K.E., Kaiser, H. (2023). Halide Code Generation Framework in Phylanx. In: Singer, J., Elkhatib, Y., Blanco Heras, D., Diehl, P., Brown, N., Ilic, A. (eds) Euro-Par 2022: Parallel Processing Workshops. Euro-Par 2022. Lecture Notes in Computer Science, vol 13835. Springer, Cham. https://doi.org/10.1007/978-3-031-31209-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-31209-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31208-3
Online ISBN: 978-3-031-31209-0
eBook Packages: Computer ScienceComputer Science (R0)