Skip to main content

Performance Portability Assessment: Non-negative Matrix Factorization as a Case Study

  • Conference paper
  • First Online:
Euro-Par 2022: Parallel Processing Workshops (Euro-Par 2022)

Abstract

SYCL standard has been released with the conviction to increase code portability in heterogeneous environments. On its side, Intel has launched the oneAPI toolkit, which includes the Data Parallel C++ language, the Intel implementation of SYCL. SYCL is designed to use a single source code to target multiple accelerators, such as multi-core CPUs, GPUs, or even FPGAs. Additionally, the C/C++ oneAPI compiler also supports OpenMP which also allows targeting CPU and GPU devices. In this paper, a performance evaluation of SYCL and OpenMP is carried out using the well-known, Non-negative Matrix Factorization (NMF) algorithm. Three different NMF implementations (baseline, SYCL and OpenMP) are developed to analyze the speedups on both CPU and GPU devices. Experimental results show that while on CPUs both programming models report almost the same performance, on GPUs, SYCL slightly outperforms OpenMP counterpart.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Available in: https://github.com/artecs-group/nmf-dpcpp.

  2. 2.

    Available in: https://github.com/artecs-group/nmf-openmp.

  3. 3.

    https://spec.oneapi.io/versions/latest/elements/oneMKL/source/domains/vm/div.html#onemkl-vm-div.

References

  1. Barrett, T., Wilhite, S.E., et al.: NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res. 41(D1), D991–D995 (2012)

    Article  Google Scholar 

  2. Breyer, M., Van Craen, A., Pflüger, D.: A comparison of SYCL, OpenCL, CUDA, and OpenMP for massively parallel support vector machine classification on multi-vendor hardware. In: International Workshop on OpenCL. IWOCL 2022. Association for Computing Machinery, New York (2022)

    Google Scholar 

  3. Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. 101(12), 4164–4169 (2004)

    Article  Google Scholar 

  4. Castaño, G., Faqir-Rhazoui, Y., García, C., Prieto-Matías, M.: Evaluation of Intel’s DPC++ Compatibility Tool in heterogeneous computing. J. Parallel Distrib. Comput. 165, 120–129 (2022)

    Article  Google Scholar 

  5. Chopra, P., Lee, J., Kang, J., Lee, S.: Improving cancer classification accuracy using gene pairs. PLoS ONE 5(12), e14305 (2010)

    Article  Google Scholar 

  6. Christgau, S., Steinke, T.: Porting a legacy CUDA stencil code to oneAPI. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 359–367 (2020)

    Google Scholar 

  7. Gottschlag, M., Brantsch, P., Bellosa, F.: Automatic core specialization for AVX-512 applications. In: Proceedings of the 13th ACM International Systems and Storage Conference, pp. 25–35. Association for Computing Machinery (2020)

    Google Scholar 

  8. Gottschlag, M., Schmidt, T., Bellosa, F.: AVX overhead profiling: how much does your fast code slow you down? In: Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems, pp. 59–66. Association for Computing Machinery (2020)

    Google Scholar 

  9. Intel: oneAPI GPU Optimization Guide (2021). https://software.intel.com/content/www/us/en/develop/documentation/oneapi-gpu-optimization-guide

  10. Khronos SYCL working group: Sycl 1.2.1 specification (2020). https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf

  11. Konda, S.: OpenMP* features and extensions supported in Intel oneAPI DPC++/C++ compiler (2021). https://software.intel.com/content/www/us/en/develop/articles/openmp-features-and-extensions-supported-in-icx

  12. Kwak, H., Lee, B., et al.: Effects of multithreading on cache performance. IEEE Trans. Comput. 48(2), 176–184 (1999)

    Article  Google Scholar 

  13. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  MATH  Google Scholar 

  14. Lin, X., Boutros, P.C.: Optimization and expansion of non-negative matrix factorization. BMC Bioinform. 21(1), 1–10 (2020)

    Article  Google Scholar 

  15. Noudohouenou, J., Hariharan, N.: Using OpenMP accelerator offload for programming heterogeneous architectures (2021). https://techdecoded.intel.io/resources/using-openmp-accelerator-offload-for-programming-heterogeneous-architectures

  16. Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)

    Article  Google Scholar 

  17. Poenaru, A., Lin, W.-C., McIntosh-Smith, S.: A performance analysis of modern parallel programming models using a compute-bound application. In: Chamberlain, B.L., Varbanescu, A.-L., Ltaief, H., Luszczek, P. (eds.) ISC High Performance 2021. LNCS, vol. 12728, pp. 332–350. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78713-4_18

    Chapter  Google Scholar 

  18. Reinders, J.: Benefits of adopting LLVM (2021). https://software.intel.com/content/www/us/en/develop/blogs/adoption-of-llvm-complete-icx

  19. Reinders, J., Ashbaugh, B., et al.: Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL. Springer, Cham (2021). https://doi.org/10.1007/978-1-4842-5574-2

    Book  Google Scholar 

  20. Reyes, R., Lomüller, V.: SYCL: single-source C++ accelerator programming. In: Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing. Advances in Parallel Computing, vol. 27, pp. 673–682. IOS Press (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youssef Faqir-Rhazoui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Faqir-Rhazoui, Y., García, C., Tirado, F. (2023). Performance Portability Assessment: Non-negative Matrix Factorization as a Case Study. In: Singer, J., Elkhatib, Y., Blanco Heras, D., Diehl, P., Brown, N., Ilic, A. (eds) Euro-Par 2022: Parallel Processing Workshops. Euro-Par 2022. Lecture Notes in Computer Science, vol 13835. Springer, Cham. https://doi.org/10.1007/978-3-031-31209-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31209-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31208-3

  • Online ISBN: 978-3-031-31209-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics