Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

Nozal, Raúl; Bosque, Jose Luis

doi:10.1007/978-3-030-85665-6_31

Raúl Nozal¹¹ &
Jose Luis Bosque¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12820))

Included in the following conference series:

European Conference on Parallel Processing

2184 Accesses

Abstract

Programming efficiently heterogeneous systems is a major challenge, due to the complexity of their architectures. Intel oneAPI, a new and powerful standards-based unified programming model, built on top of SYCL, addresses these issues. In this paper, oneAPI is provided with co-execution strategies to run the same kernel between different devices, enabling the exploitation of static and dynamic policies. On top of that, static and dynamic load-balancing algorithms are integrated and analyzed. This work evaluates the performance and energy efficiency for a well-known set of regular and irregular HPC benchmarks, using an integrated GPU and CPU. Experimental results show that co-execution is worthwhile when using dynamic algorithms, improving efficiency even more when using unified shared memory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL

Article Open access 31 January 2025

Energy efficiency of load balancing for data-parallel applications in heterogeneous systems

Article 08 September 2016

Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM’s Hybrid CPU + GPU Systems (Part II)

Notes

1.
https://github.com/oneAPI-scheduling/CoexecutorRuntime.

References

Aktemur, B., Metzger, M., Saiapova, N., Strasuns, M.: Debugging sycl programs on heterogeneous architectures. In: International Workshop on OpenCL, IWOCL. ACM (2020)
Google Scholar
Ashbaugh, B., et al.: Data parallel c++: Enhancing sycl through extensions for productivity and performance. In International Workshop on OpenCL, IWOCL. ACM (2020)
Google Scholar
Beri, T., Bansal, S., Kumar, S.: The unicorn runtime: efficient distributed shared memory programming for hybrid cpu-gpu clusters. IEEE Trans. Parallel Distrib. Syst. 28(5), 1518–1534 (2017)
Article Google Scholar
Castillo, E., Camarero, C., Borrego, A., Bosque, J.L.: Financial applications on multi-cpu and multi-gpu architectures. J. Supercomput. 71(2), 729–739 (2015)
Article Google Scholar
Christgau, S., Steinke, T.: Porting a legacy cuda stencil code to oneapi. In: Proceedings of IPDPSW, pp. 359–367 (2020)
Google Scholar
Constantinescu, D.A., Navarro, A.G., Corbera, F., Fernández-Madrigal, J.A., Asenjo, R.: Efficiency and productivity for decision making on low-power heterogeneous cpu+gpu socs. J. Supercomput. (2020)
Google Scholar
Intel Corporation. Intel$\textregistered $ oneAPI programming guide (2020)
Google Scholar
Costero, L., Igual, F.D., Olcoz, K., Tirado, F.: Leveraging knowledge-as-a-service (kaas) for qos-aware resource management in multi-user video transcoding. J. Supercomput. 76(12), 9388–9403 (2020)
Article Google Scholar
Farber, R.: Parallel Programming with OpenACC, 1st edn. Morgan Kaufmann Publishers, San Francisco (2016)
Google Scholar
Gaster, B.R., Howes, L.W., Kaeli, D.R., Mistry, P., Schaa, D.: Heterogeneous Computing with OpenCL - Revised OpenCL 1.2 Edition. Morgan Kaufmann, San Francisco (2013)
Google Scholar
Khronos$\textregistered $ SYCL$^{{\rm TM}}$ Working Group. SYCL$^{{\rm TM}}$ specification: Generic heterogeneous computing for modern c++ (2020)
Google Scholar
Jin, Z.: The rodinia benchmark suite in SYCL. Technical report, Argonne National Lab. (ANL), IL (United States) (2020)
Google Scholar
Jin, Z., Morozov, V., Finkel, H.: A case study on the haccmk routine in sycl on integrated graphics. In: Proceedings of IPDPSW, pp. 368–374 (2020)
Google Scholar
Lin, F.-C., Ngo, H.-H., Dow, C.-R.: A cloud-based face video retrieval system with deep learning. J. Supercomput. 76(11), 8473–8493 (2020). https://doi.org/10.1007/s11227-019-03123-x
Article Google Scholar
Nozal, R., Bosque, J.L., Beivide, R.: Towards co-execution on commodity heterogeneous systems: Optimizations for time-constrained scenarios. In: 2019 International Conference on High Performance Computing & Simulation (HPCS), pp. 628–635. IEEE (2019)
Google Scholar
Nozal, R., Bosque, J.L., Beivide, R.: Enginecl: usability and performance in heterogeneous computing. Fut. Gener. Comput. Syst. 107(C), 522–537 (2020)
Article Google Scholar
Nozal, R., Perez, B., Bosque, J.L., Beivide, R.: Load balancing in a heterogeneous world: Cpu-xeon phi co-execution of data-parallel kernels. J. Supercomput. 75(3), 1123–1136 (2019)
Article Google Scholar
Pérez, B., Bosque, J.L., Beivide, R.: Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proceedings of the 9th Workshop on General Purpose Processing using GPU, pp. 42–51 (2016)
Google Scholar
Pérez, B., Stafford, E., Bosque, J.L., Beivide, R.: Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. J. Supercomput. 73(1), 330–342 (2016). https://doi.org/10.1007/s11227-016-1864-y
Article Google Scholar
Shen, J., Varbanescu, A.L., Lu, Y., Zou, P., Sips, H.: Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 27(9), 2766–2780 (2016)
Article Google Scholar
Shin, W., Yoo, K.H., Baek, N.: Large-scale data computing performance comparisons on sycl heterogeneous parallel processing layer implementations. Appl. Sci. 10, 1656 (2020)
Article Google Scholar
Toharia, P., Robles, O.D., Suárez, R., Bosque, J.L., Pastor, L.: Shot boundary detection using zernike moments in multi-gpu multi-cpu architectures. J. Parallel Distrib. Comput. 72(9), 1127–1133 (2012)
Article Google Scholar
Vitali, E., Gadioli, D., Palermo, G., Beccari, A., Cavazzoni, C., Silvano, C.: Exploiting openmp and openacc to accelerate a geometric approach to molecular docking in heterogeneous HPC nodes. J. Supercomput. 75(7), 3374–3396 (2019)
Article Google Scholar
Zahran, M.: Heterogeneous computing: here to stay. Commun. ACM 60(3), 42–45 (2017)
Article Google Scholar
Zhang, F., Zhai, J., He, B., Zhang, S., Chen, W.: Understanding co-running behaviors on integrated cpu/gpu architectures. IEEE Trans. Parallel Distrib. Syst. 28(3), 905–918 (2017)
Article Google Scholar

Download references

Acknowledgment

This work has been supported by the Spanish Ministry of Education (FPU16/ 03299 grant), the Spanish Science and Technology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence.

Author information

Authors and Affiliations

Department of Computer Science and Electronics, Universidad de Cantabria, Santander, Spain
Raúl Nozal & Jose Luis Bosque

Authors

Raúl Nozal
View author publications
You can also search for this author in PubMed Google Scholar
Jose Luis Bosque
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raúl Nozal .

Editor information

Editors and Affiliations

Universidade de Lisboa, Lisbon, Portugal
Leonel Sousa
Universidade de Lisboa, Lisbon, Portugal
Nuno Roma
Universidade de Lisboa, Lisbon, Portugal
Pedro Tomás

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nozal, R., Bosque, J.L. (2021). Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective. In: Sousa, L., Roma, N., Tomás, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science(), vol 12820. Springer, Cham. https://doi.org/10.1007/978-3-030-85665-6_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-85665-6_31
Published: 25 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85664-9
Online ISBN: 978-3-030-85665-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL

Energy efficiency of load balancing for data-parallel applications in heterogeneous systems

Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM’s Hybrid CPU + GPU Systems (Part II)

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL

Energy efficiency of load balancing for data-parallel applications in heterogeneous systems

Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM’s Hybrid CPU + GPU Systems (Part II)

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation