Performance Portability Assessment: Non-negative Matrix Factorization as a Case Study

Faqir-Rhazoui, Youssef; García, Carlos; Tirado, Francisco

doi:10.1007/978-3-031-31209-0_18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13835))

Included in the following conference series:

European Conference on Parallel Processing

Abstract

SYCL standard has been released with the conviction to increase code portability in heterogeneous environments. On its side, Intel has launched the oneAPI toolkit, which includes the Data Parallel C++ language, the Intel implementation of SYCL. SYCL is designed to use a single source code to target multiple accelerators, such as multi-core CPUs, GPUs, or even FPGAs. Additionally, the C/C++ oneAPI compiler also supports OpenMP which also allows targeting CPU and GPU devices. In this paper, a performance evaluation of SYCL and OpenMP is carried out using the well-known, Non-negative Matrix Factorization (NMF) algorithm. Three different NMF implementations (baseline, SYCL and OpenMP) are developed to analyze the speedups on both CPU and GPU devices. Experimental results show that while on CPUs both programming models report almost the same performance, on GPUs, SYCL slightly outperforms OpenMP counterpart.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Portability and Scalability of OpenMP Offloading on State-of-the-Art Accelerators

Evaluating Performance Portability of Accelerator Programming Models using SPEC ACCEL 1.2 Benchmarks

Comparing High Performance Computing Accelerator Programming Models

Notes

1.
Available in: https://github.com/artecs-group/nmf-dpcpp.
2.
Available in: https://github.com/artecs-group/nmf-openmp.
3.
https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/vm/div.html#onemkl-vm-div.

References

Barrett, T., Wilhite, S.E., et al.: NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res. 41(D1), D991–D995 (2012)
Article Google Scholar
Breyer, M., Van Craen, A., Pflüger, D.: A comparison of SYCL, OpenCL, CUDA, and OpenMP for massively parallel support vector machine classification on multi-vendor hardware. In: International Workshop on OpenCL. IWOCL 2022. Association for Computing Machinery, New York (2022)
Google Scholar
Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. 101(12), 4164–4169 (2004)
Article Google Scholar
Castaño, G., Faqir-Rhazoui, Y., García, C., Prieto-Matías, M.: Evaluation of Intel’s DPC++ Compatibility Tool in heterogeneous computing. J. Parallel Distrib. Comput. 165, 120–129 (2022)
Article Google Scholar
Chopra, P., Lee, J., Kang, J., Lee, S.: Improving cancer classification accuracy using gene pairs. PLoS ONE 5(12), e14305 (2010)
Article Google Scholar
Christgau, S., Steinke, T.: Porting a legacy CUDA stencil code to oneAPI. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 359–367 (2020)
Google Scholar
Gottschlag, M., Brantsch, P., Bellosa, F.: Automatic core specialization for AVX-512 applications. In: Proceedings of the 13th ACM International Systems and Storage Conference, pp. 25–35. Association for Computing Machinery (2020)
Google Scholar
Gottschlag, M., Schmidt, T., Bellosa, F.: AVX overhead profiling: how much does your fast code slow you down? In: Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems, pp. 59–66. Association for Computing Machinery (2020)
Google Scholar
Intel: oneAPI GPU Optimization Guide (2021). https://software.intel.com/content/www/us/en/develop/documentation/oneapi-gpu-optimization-guide
Khronos SYCL working group: Sycl 1.2.1 specification (2020). https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf
Konda, S.: OpenMP* features and extensions supported in Intel oneAPI DPC++/C++ compiler (2021). https://software.intel.com/content/www/us/en/develop/articles/openmp-features-and-extensions-supported-in-icx
Kwak, H., Lee, B., et al.: Effects of multithreading on cache performance. IEEE Trans. Comput. 48(2), 176–184 (1999)
Article Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article MATH Google Scholar
Lin, X., Boutros, P.C.: Optimization and expansion of non-negative matrix factorization. BMC Bioinform. 21(1), 1–10 (2020)
Article Google Scholar
Noudohouenou, J., Hariharan, N.: Using OpenMP accelerator offload for programming heterogeneous architectures (2021). https://techdecoded.intel.io/resources/using-openmp-accelerator-offload-for-programming-heterogeneous-architectures
Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Article Google Scholar
Poenaru, A., Lin, W.-C., McIntosh-Smith, S.: A performance analysis of modern parallel programming models using a compute-bound application. In: Chamberlain, B.L., Varbanescu, A.-L., Ltaief, H., Luszczek, P. (eds.) ISC High Performance 2021. LNCS, vol. 12728, pp. 332–350. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78713-4_18
Chapter Google Scholar
Reinders, J.: Benefits of adopting LLVM (2021). https://software.intel.com/content/www/us/en/develop/blogs/adoption-of-llvm-complete-icx
Reinders, J., Ashbaugh, B., et al.: Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL. Springer, Cham (2021). https://doi.org/10.1007/978-1-4842-5574-2
Book Google Scholar
Reyes, R., Lomüller, V.: SYCL: single-source C++ accelerator programming. In: Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing. Advances in Parallel Computing, vol. 27, pp. 673–682. IOS Press (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Complutense Madrid, Madrid, Spain
Youssef Faqir-Rhazoui, Carlos García & Francisco Tirado
Instituto de Tecnología del Conocimiento, Madrid, Spain
Carlos García

Authors

Youssef Faqir-Rhazoui
View author publications
You can also search for this author in PubMed Google Scholar
Carlos García
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Tirado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youssef Faqir-Rhazoui .

Editor information

Editors and Affiliations

University of Glasgow, Glasgow, UK
Jeremy Singer
University of Glasgow, Glasgow, UK
Yehia Elkhatib
University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Dora Blanco Heras
Louisiana State University, Baton Rouge, LA, USA
Patrick Diehl
University of Edinburgh, Edinburgh, UK
Nick Brown
Universidade de Lisboa, Lisbon, Portugal
Aleksandar Ilic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Faqir-Rhazoui, Y., García, C., Tirado, F. (2023). Performance Portability Assessment: Non-negative Matrix Factorization as a Case Study. In: Singer, J., Elkhatib, Y., Blanco Heras, D., Diehl, P., Brown, N., Ilic, A. (eds) Euro-Par 2022: Parallel Processing Workshops. Euro-Par 2022. Lecture Notes in Computer Science, vol 13835. Springer, Cham. https://doi.org/10.1007/978-3-031-31209-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-31209-0_18
Published: 02 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31208-3
Online ISBN: 978-3-031-31209-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance Portability Assessment: Non-negative Matrix Factorization as a Case Study