Abstract
SYCL standard has been released with the conviction to increase code portability in heterogeneous environments. On its side, Intel has launched the oneAPI toolkit, which includes the Data Parallel C++ language, the Intel implementation of SYCL. SYCL is designed to use a single source code to target multiple accelerators, such as multi-core CPUs, GPUs, or even FPGAs. Additionally, the C/C++ oneAPI compiler also supports OpenMP which also allows targeting CPU and GPU devices. In this paper, a performance evaluation of SYCL and OpenMP is carried out using the well-known, Non-negative Matrix Factorization (NMF) algorithm. Three different NMF implementations (baseline, SYCL and OpenMP) are developed to analyze the speedups on both CPU and GPU devices. Experimental results show that while on CPUs both programming models report almost the same performance, on GPUs, SYCL slightly outperforms OpenMP counterpart.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barrett, T., Wilhite, S.E., et al.: NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res. 41(D1), D991–D995 (2012)
Breyer, M., Van Craen, A., Pflüger, D.: A comparison of SYCL, OpenCL, CUDA, and OpenMP for massively parallel support vector machine classification on multi-vendor hardware. In: International Workshop on OpenCL. IWOCL 2022. Association for Computing Machinery, New York (2022)
Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. 101(12), 4164–4169 (2004)
Castaño, G., Faqir-Rhazoui, Y., García, C., Prieto-Matías, M.: Evaluation of Intel’s DPC++ Compatibility Tool in heterogeneous computing. J. Parallel Distrib. Comput. 165, 120–129 (2022)
Chopra, P., Lee, J., Kang, J., Lee, S.: Improving cancer classification accuracy using gene pairs. PLoS ONE 5(12), e14305 (2010)
Christgau, S., Steinke, T.: Porting a legacy CUDA stencil code to oneAPI. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 359–367 (2020)
Gottschlag, M., Brantsch, P., Bellosa, F.: Automatic core specialization for AVX-512 applications. In: Proceedings of the 13th ACM International Systems and Storage Conference, pp. 25–35. Association for Computing Machinery (2020)
Gottschlag, M., Schmidt, T., Bellosa, F.: AVX overhead profiling: how much does your fast code slow you down? In: Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems, pp. 59–66. Association for Computing Machinery (2020)
Intel: oneAPI GPU Optimization Guide (2021). https://software.intel.com/content/www/us/en/develop/documentation/oneapi-gpu-optimization-guide
Khronos SYCL working group: Sycl 1.2.1 specification (2020). https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf
Konda, S.: OpenMP* features and extensions supported in Intel oneAPI DPC++/C++ compiler (2021). https://software.intel.com/content/www/us/en/develop/articles/openmp-features-and-extensions-supported-in-icx
Kwak, H., Lee, B., et al.: Effects of multithreading on cache performance. IEEE Trans. Comput. 48(2), 176–184 (1999)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Lin, X., Boutros, P.C.: Optimization and expansion of non-negative matrix factorization. BMC Bioinform. 21(1), 1–10 (2020)
Noudohouenou, J., Hariharan, N.: Using OpenMP accelerator offload for programming heterogeneous architectures (2021). https://techdecoded.intel.io/resources/using-openmp-accelerator-offload-for-programming-heterogeneous-architectures
Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Poenaru, A., Lin, W.-C., McIntosh-Smith, S.: A performance analysis of modern parallel programming models using a compute-bound application. In: Chamberlain, B.L., Varbanescu, A.-L., Ltaief, H., Luszczek, P. (eds.) ISC High Performance 2021. LNCS, vol. 12728, pp. 332–350. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78713-4_18
Reinders, J.: Benefits of adopting LLVM (2021). https://software.intel.com/content/www/us/en/develop/blogs/adoption-of-llvm-complete-icx
Reinders, J., Ashbaugh, B., et al.: Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL. Springer, Cham (2021). https://doi.org/10.1007/978-1-4842-5574-2
Reyes, R., Lomüller, V.: SYCL: single-source C++ accelerator programming. In: Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing. Advances in Parallel Computing, vol. 27, pp. 673–682. IOS Press (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Faqir-Rhazoui, Y., García, C., Tirado, F. (2023). Performance Portability Assessment: Non-negative Matrix Factorization as a Case Study. In: Singer, J., Elkhatib, Y., Blanco Heras, D., Diehl, P., Brown, N., Ilic, A. (eds) Euro-Par 2022: Parallel Processing Workshops. Euro-Par 2022. Lecture Notes in Computer Science, vol 13835. Springer, Cham. https://doi.org/10.1007/978-3-031-31209-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-31209-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31208-3
Online ISBN: 978-3-031-31209-0
eBook Packages: Computer ScienceComputer Science (R0)