Skip to main content

Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

  • Conference paper
  • First Online:
Euro-Par 2021: Parallel Processing (Euro-Par 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12820))

Included in the following conference series:

  • 2184 Accesses

Abstract

Programming efficiently heterogeneous systems is a major challenge, due to the complexity of their architectures. Intel oneAPI, a new and powerful standards-based unified programming model, built on top of SYCL, addresses these issues. In this paper, oneAPI is provided with co-execution strategies to run the same kernel between different devices, enabling the exploitation of static and dynamic policies. On top of that, static and dynamic load-balancing algorithms are integrated and analyzed. This work evaluates the performance and energy efficiency for a well-known set of regular and irregular HPC benchmarks, using an integrated GPU and CPU. Experimental results show that co-execution is worthwhile when using dynamic algorithms, improving efficiency even more when using unified shared memory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/oneAPI-scheduling/CoexecutorRuntime.

References

  1. Aktemur, B., Metzger, M., Saiapova, N., Strasuns, M.: Debugging sycl programs on heterogeneous architectures. In: International Workshop on OpenCL, IWOCL. ACM (2020)

    Google Scholar 

  2. Ashbaugh, B., et al.: Data parallel c++: Enhancing sycl through extensions for productivity and performance. In International Workshop on OpenCL, IWOCL. ACM (2020)

    Google Scholar 

  3. Beri, T., Bansal, S., Kumar, S.: The unicorn runtime: efficient distributed shared memory programming for hybrid cpu-gpu clusters. IEEE Trans. Parallel Distrib. Syst. 28(5), 1518–1534 (2017)

    Article  Google Scholar 

  4. Castillo, E., Camarero, C., Borrego, A., Bosque, J.L.: Financial applications on multi-cpu and multi-gpu architectures. J. Supercomput. 71(2), 729–739 (2015)

    Article  Google Scholar 

  5. Christgau, S., Steinke, T.: Porting a legacy cuda stencil code to oneapi. In: Proceedings of IPDPSW, pp. 359–367 (2020)

    Google Scholar 

  6. Constantinescu, D.A., Navarro, A.G., Corbera, F., Fernández-Madrigal, J.A., Asenjo, R.: Efficiency and productivity for decision making on low-power heterogeneous cpu+gpu socs. J. Supercomput. (2020)

    Google Scholar 

  7. Intel Corporation. Intel\(\textregistered \) oneAPI programming guide (2020)

    Google Scholar 

  8. Costero, L., Igual, F.D., Olcoz, K., Tirado, F.: Leveraging knowledge-as-a-service (kaas) for qos-aware resource management in multi-user video transcoding. J. Supercomput. 76(12), 9388–9403 (2020)

    Article  Google Scholar 

  9. Farber, R.: Parallel Programming with OpenACC, 1st edn. Morgan Kaufmann Publishers, San Francisco (2016)

    Google Scholar 

  10. Gaster, B.R., Howes, L.W., Kaeli, D.R., Mistry, P., Schaa, D.: Heterogeneous Computing with OpenCL - Revised OpenCL 1.2 Edition. Morgan Kaufmann, San Francisco (2013)

    Google Scholar 

  11. Khronos\(\textregistered \) SYCL\(^{{\rm TM}}\) Working Group. SYCL\(^{{\rm TM}}\) specification: Generic heterogeneous computing for modern c++ (2020)

    Google Scholar 

  12. Jin, Z.: The rodinia benchmark suite in SYCL. Technical report, Argonne National Lab. (ANL), IL (United States) (2020)

    Google Scholar 

  13. Jin, Z., Morozov, V., Finkel, H.: A case study on the haccmk routine in sycl on integrated graphics. In: Proceedings of IPDPSW, pp. 368–374 (2020)

    Google Scholar 

  14. Lin, F.-C., Ngo, H.-H., Dow, C.-R.: A cloud-based face video retrieval system with deep learning. J. Supercomput. 76(11), 8473–8493 (2020). https://doi.org/10.1007/s11227-019-03123-x

    Article  Google Scholar 

  15. Nozal, R., Bosque, J.L., Beivide, R.: Towards co-execution on commodity heterogeneous systems: Optimizations for time-constrained scenarios. In: 2019 International Conference on High Performance Computing & Simulation (HPCS), pp. 628–635. IEEE (2019)

    Google Scholar 

  16. Nozal, R., Bosque, J.L., Beivide, R.: Enginecl: usability and performance in heterogeneous computing. Fut. Gener. Comput. Syst. 107(C), 522–537 (2020)

    Article  Google Scholar 

  17. Nozal, R., Perez, B., Bosque, J.L., Beivide, R.: Load balancing in a heterogeneous world: Cpu-xeon phi co-execution of data-parallel kernels. J. Supercomput. 75(3), 1123–1136 (2019)

    Article  Google Scholar 

  18. Pérez, B., Bosque, J.L., Beivide, R.: Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proceedings of the 9th Workshop on General Purpose Processing using GPU, pp. 42–51 (2016)

    Google Scholar 

  19. Pérez, B., Stafford, E., Bosque, J.L., Beivide, R.: Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. J. Supercomput. 73(1), 330–342 (2016). https://doi.org/10.1007/s11227-016-1864-y

    Article  Google Scholar 

  20. Shen, J., Varbanescu, A.L., Lu, Y., Zou, P., Sips, H.: Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 27(9), 2766–2780 (2016)

    Article  Google Scholar 

  21. Shin, W., Yoo, K.H., Baek, N.: Large-scale data computing performance comparisons on sycl heterogeneous parallel processing layer implementations. Appl. Sci. 10, 1656 (2020)

    Article  Google Scholar 

  22. Toharia, P., Robles, O.D., Suárez, R., Bosque, J.L., Pastor, L.: Shot boundary detection using zernike moments in multi-gpu multi-cpu architectures. J. Parallel Distrib. Comput. 72(9), 1127–1133 (2012)

    Article  Google Scholar 

  23. Vitali, E., Gadioli, D., Palermo, G., Beccari, A., Cavazzoni, C., Silvano, C.: Exploiting openmp and openacc to accelerate a geometric approach to molecular docking in heterogeneous HPC nodes. J. Supercomput. 75(7), 3374–3396 (2019)

    Article  Google Scholar 

  24. Zahran, M.: Heterogeneous computing: here to stay. Commun. ACM 60(3), 42–45 (2017)

    Article  Google Scholar 

  25. Zhang, F., Zhai, J., He, B., Zhang, S., Chen, W.: Understanding co-running behaviors on integrated cpu/gpu architectures. IEEE Trans. Parallel Distrib. Syst. 28(3), 905–918 (2017)

    Article  Google Scholar 

Download references

Acknowledgment

This work has been supported by the Spanish Ministry of Education (FPU16/ 03299 grant), the Spanish Science and Technology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raúl Nozal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nozal, R., Bosque, J.L. (2021). Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective. In: Sousa, L., Roma, N., Tomás, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science(), vol 12820. Springer, Cham. https://doi.org/10.1007/978-3-030-85665-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85665-6_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85664-9

  • Online ISBN: 978-3-030-85665-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics