Halide Code Generation Framework in Phylanx

Tohid, R.; Shirzad, Shahrzad; Taylor, Christopher; Sakin, Sayef Azad; Isaacs, Katherine E.; Kaiser, Hartmut

doi:10.1007/978-3-031-31209-0_3

R. Tohid ORCID: orcid.org/0000-0001-7776-0380¹³,
Shahrzad Shirzad¹³,
Christopher Taylor¹⁴,
Sayef Azad Sakin¹⁵,
Katherine E. Isaacs¹⁵ &
…
Hartmut Kaiser¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13835))

Included in the following conference series:

European Conference on Parallel Processing

543 Accesses
1 Altmetric

Abstract

Separating algorithms from their computation schedule has become a de facto solution to tackle the challenges of developing high performance code on modern heterogeneous architectures. Common approaches include Domain-specific languages (DSLs) which provide familiar APIs to domain experts, code generation frameworks that automate the generation of fast and portable code, and runtime systems that manage threads for concurrency and parallelism. In this paper, we present the Halide code generation framework for Phylanx distributed array processing platform. This extension enables compile-time optimization of Phylanx primitives for target architectures. To accomplish this, (1) we implemented new Phylanx primitives using Halide, and (2) partially exported Halide’s thread pool API to carry out parallelism on HPX (Phylanx’s runtime) threads. (3) showcased HPX performance analysis tools made available to Halide applications. The evaluation of the work has been done in two steps. First, we compare the performance of Halide applications running on its native runtime with that of the new HPX backend to verify there is no cost associated with using HPX threads. Next, we compare performances of a number of original implementations of Phylanx primitives against the new ones in Halide to verify performance and portability benefits of Halide in the context of Phylanx.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

OpenMP as a High-Level Specification Language for Parallelism

On the Benefits of Tasking with OpenMP

Automatic Thread Block Size Selection Strategy in GPU Parallel Code Generation

Notes

1.
https://github.com/halide/Halide/tree/master/apps/harris.

References

Schaller, R.R.: Moore’s law: past, present and future. IEEE Spectr. 34(6), 52–59 (1997)
Article Google Scholar
Dennard, R.H., Gaensslen, F.H., Yu, H.-N., Rideout, V.L., Bassous, E., LeBlanc, A.R.: Design of ion-implanted mosfet’s with very small physical dimensions. IEEE J. Solid-State Circ. 9(5), 256–268 (1974)
Article Google Scholar
Sujeeth, A.K., et al.: Delite: a compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst. (TECS) 13(4s), 1–25 (2014)
Article Google Scholar
Kelker, R.D.: Clojure for Domain-Specific Languages. Packt Publishing Ltd. (2013)
Google Scholar
Lee, H.J., et al.: Implementing domain-specific languages for heterogeneous parallel computing. IEEE Micro 31(5), 42–53 (2011)
Article Google Scholar
Brown, K.J., et al.: A heterogeneous parallel framework for domain-specific languages. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp. 89–100. IEEE (2011)
Google Scholar
Chafi, H., et al.: Language virtualization for heterogeneous parallel computing. ACM SIGPLAN Not. 45(10), 835–847 (2010)
Article Google Scholar
Sujeeth, A.K., et al.: Optiml: an implicitly parallel domain-specific language for machine learning. In: ICML (2011)
Google Scholar
Gysi, T., Osuna, C., Fuhrer, O., Bianco, M., Schulthess, T.C.: Stella: a domain-specific tool for structured grid methods in weather and climate models. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2015)
Google Scholar
Osuna, C., Wicky, T., Thuering, F., Hoefler, T., Fuhrer, O.: Dawn: a high-level domain-specific language compiler toolchain for weather and climate applications. Supercomputi. Front. Innov. 7(2), 79–97 (2020)
Google Scholar
Osuna, C., Thuering, F., Wicky, T., Dahm, J., et al.: Meteoswiss-apn/dawn: 0.0. 2 (2020)
Google Scholar
Chen, C., Chame, J., Hall, M.: Chill: a framework for composing high-level loop transformations. Technical report, Citeseer (2008)
Google Scholar
Tiwari, A., Chen, C., Chame, J., Hall, M., Hollingsworth, J.K.: A scalable auto-tuning framework for compiler optimization. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–12. IEEE (2009)
Google Scholar
Tapus, C., Chung, I.H., Hollingsworth, J.K.: Active harmony: towards automated performance tuning. In: SC 2002: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pp. 44–44. IEEE (2002)
Google Scholar
Baghdadi, R., et al.: Tiramisu: a polyhedral compiler for expressing fast and portable code. In: 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 193–205. IEEE (2019)
Google Scholar
Kale, L.V., Krishnan, S.: Charm++ a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 91–108 (1993)
Google Scholar
Kale, L.V., Bhatele, A.: Parallel Science and Engineering Applications: The Charm++ Approach. CRC Press (2019)
Google Scholar
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC 2012: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE (2012)
Google Scholar
Kaiser, H., et al.: HPX-the C++ standard library for parallelism and concurrency. J. Open Source Softw. 5(53), 2352 (2020)
Article Google Scholar
Wagle, B., Kellar, S., Serio, A., Kaiser, H.: Methodology for adaptive active message coalescing in task based runtime systems. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1133–1140. IEEE (2018)
Google Scholar
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Huck, K.A., et al.: An autonomic performance environment for exascale. Supercomput. Front. Innov. 2(3), 49–66 (2015)
Google Scholar
Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: a portable interface to hardware performance counters. In: Proceedings of the Department of Defense HPCMP Users Group Conference, vol. 710. Citeseer (1999)
Google Scholar
Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open trace format 2: the next generation of scalable trace formats and support libraries. In: Applications, Tools and Techniques on the Road to Exascale Computing, pp. 481–490. IOS Press (2012)
Google Scholar
Traveler-integrated (2021). https://github.com/hdc-arizona/traveler-integrated
Tohid, R., et al.: Asynchronous execution of python code on task-based runtime systems. In: 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), pp. 37–45. IEEE (2018)
Google Scholar
Hasheminezhad, B., Shirzad, S., Wu, N., Diehl, P., Schulz, H., Kaiser, H.: Towards a scalable and distributed infrastructure for deep learning applications. In: 2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS), pp. 20–30. IEEE (2020)
Google Scholar
Brandt, S.R., et al.: Distributed asynchronous array computing with the jetlag environment. In: 2020 IEEE/ACM 9th Workshop on Python for High-Performance and Scientific Computing (PyHPC), pp. 49–57. IEEE (2020)
Google Scholar
Gupta, N., et al.: Deploying a task-based runtime system on raspberry pi clusters. In: 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), pp. 11–20. IEEE (2020)
Google Scholar
Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: scientific containers for mobility of compute. PLoS ONE 12(5), e0177459 (2017)
Article Google Scholar
Merkel, D., et al.: Docker: lightweight linux containers for consistent development and deployment. Linux J. 2014(239), 2 (2014)
Google Scholar
Dooley, R., Brandt, S.R., Fonner, J.: The agave platform: an open, science-as-a-service platform for digital science. In: Proceedings of the Practice and Experience on Advanced Research Computing, pp. 1–8 (2018)
Google Scholar
Brandt, S.R., et al.: Jetlag: an interactive, asynchronous array computing environment. In: Practice and Experience in Advanced Research Computing, pp. 8–12 (2020)
Google Scholar
Blaze. https://bitbucket.org/blaze-lib/blaze/. Accessed 10 Sept 2021
Blaze tensor (2021). https://github.com/STEllAR-GROUP/blaze_tensor/
Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The numpy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2), 22–30 (2011)
Article Google Scholar
Sakin, S.A., et al.: Traveler: navigating task parallel traces for performance analysis. arXiv e-prints, pages arXiv-2208 (2022)
Google Scholar
Rostam cluster, ste$||$ar group at cct (2021). https://wiki.rostam.cct.lsu.edu/

Download references

Author information

Authors and Affiliations

Center for Computation and Technology, Louisiana State University, Baton Rouge, USA
R. Tohid, Shahrzad Shirzad & Hartmut Kaiser
Tactical Computing Labs, Muenster, USA
Christopher Taylor
SCI Institute, University of Utah, Salt Lake City, USA
Sayef Azad Sakin & Katherine E. Isaacs

Authors

R. Tohid
View author publications
You can also search for this author in PubMed Google Scholar
Shahrzad Shirzad
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Sayef Azad Sakin
View author publications
You can also search for this author in PubMed Google Scholar
Katherine E. Isaacs
View author publications
You can also search for this author in PubMed Google Scholar
Hartmut Kaiser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Tohid .

Editor information

Editors and Affiliations

University of Glasgow, Glasgow, UK
Jeremy Singer
University of Glasgow, Glasgow, UK
Yehia Elkhatib
University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Dora Blanco Heras
Louisiana State University, Baton Rouge, LA, USA
Patrick Diehl
University of Edinburgh, Edinburgh, UK
Nick Brown
Universidade de Lisboa, Lisbon, Portugal
Aleksandar Ilic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tohid, R., Shirzad, S., Taylor, C., Sakin, S.A., Isaacs, K.E., Kaiser, H. (2023). Halide Code Generation Framework in Phylanx. In: Singer, J., Elkhatib, Y., Blanco Heras, D., Diehl, P., Brown, N., Ilic, A. (eds) Euro-Par 2022: Parallel Processing Workshops. Euro-Par 2022. Lecture Notes in Computer Science, vol 13835. Springer, Cham. https://doi.org/10.1007/978-3-031-31209-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-31209-0_3
Published: 02 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31208-3
Online ISBN: 978-3-031-31209-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Halide Code Generation Framework in Phylanx

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

OpenMP as a High-Level Specification Language for Parallelism

On the Benefits of Tasking with OpenMP

Automatic Thread Block Size Selection Strategy in GPU Parallel Code Generation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Halide Code Generation Framework in Phylanx

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

OpenMP as a High-Level Specification Language for Parallelism

On the Benefits of Tasking with OpenMP

Automatic Thread Block Size Selection Strategy in GPU Parallel Code Generation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation