research-article

A Full-System Perspective on UPMEM Performance

Authors:
Birte Friesel

Universität Osnabrück, Osnabrück, Germany

Universität Osnabrück, Osnabrück, Germany

https://orcid.org/0000-0002-0688-9440
View Profile

,
Marcel Lütke Dreimann

Universität Osnabrück, Osnabrück, Germany

Universität Osnabrück, Osnabrück, Germany

https://orcid.org/0009-0007-2426-4798
View Profile

,
Olaf Spinczyk

Universität Osnabrück, Osnabrück, Germany

Universität Osnabrück, Osnabrück, Germany

https://orcid.org/0000-0001-9469-2367
View Profile

DIMES '23: Proceedings of the 1st Workshop on Disruptive Memory SystemsOctober 2023Pages 1–7https://doi.org/10.1145/3609308.3625266

Published:23 October 2023Publication History

DIMES '23: Proceedings of the 1st Workshop on Disruptive Memory Systems

Pages 1–7

ABSTRACT

Recently, UPMEM has introduced the first commercially available processing in memory (PIM) platform. Its key feature are DRAM memory chips with built-in RISC CPUs for in-memory data processing. Naturally, this has sparked interest in the research community, which previously was limited to PIM simulators and custom FPGA prototypes. One result of this is the PrIM benchmark suite that combines an in-depth analysis of PIM performance with benchmarks that measure the speedup of PIM over processing on conventional CPUs and GPUs [10]. However, the current generation of UPMEM PIM faces limitations such as memory interleaving, and as such does not provide true in-memory computing. Applications must store data in DRAM and transfer it to/from UPMEM modules for processing, which behave just like computational offloading engines from this perspective. This paper examines the ramifications of treating them as such in comparative performance benchmarks. By extending the PrIM suite to address the challenges that computational offloading benchmarks face, we show that such a full-system perspective can drastically alter offloading recommendations, with 9 of 11 previously UPMEM-friendly benchmarks now performing best on a conventional server CPU.

References

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks---summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Albuquerque, New Mexico, USA) (Supercomputing '91). Association for Computing Machinery, New York, NY, USA, 158--165. Google ScholarDigital Library
Alexander Baumstark, Muhammad Attahir Jibril, and Kai-Uwe Sattler. 2023. Accelerating Large Table Scan using Processing-In-Memory Technology. In BTW 2023. Gesellschaft für Informatik e.V., Bonn, 797--814. Google ScholarCross Ref
Stefano Corda, Madhurya Kumaraswamy, Ahsan Javed Awan, Roel Jordans, Akash Kumar, and Henk Corporaal. 2021. NMPO: Near-Memory Computing Profiling and Offloading. In 2021 24th Euromicro Conference on Digital System Design (DSD). 259--267. Google ScholarCross Ref
Stefano Corda, Gagandeep Singh, Ahsan Jawed Awan, Roel Jordans, and Henk Corporaal. 2019. Platform Independent Software Analysis for Near Memory Computing. In 2019 22nd Euromicro Conference on Digital System Design (DSD). 606--609. Google ScholarCross Ref
Andrew Davison. 1995. Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers. Supercomputing Review (August 1995), 54--55.Google Scholar
Fabrice Devaux. 2019. The true Processing In Memory accelerator. In 2019 IEEE Hot Chips 31 Symposium (HCS). 1--24. Google ScholarCross Ref
François Duhem, Fabrice Muller, and Philippe Lorenzini. 2011. FaRM: Fast Reconfiguration Manager for Reducing Reconfiguration Time Overhead on FPGA. In Reconfigurable Computing: Architectures, Tools and Applications, Andreas Koch, Ram Krishnamurthy, John McAllister, Roger Woods, and Tarek El-Ghazawi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 253--260.Google Scholar
Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In 2012 Innovative Parallel Computing (InPar). 1--10. Google ScholarCross Ref
Khronos OpenCL Working Group. 2023. The OpenCL specification version 3.0.14. (2023). https://registry.khronos.org/OpenCL/specs/3.0-unified/pdf/OpenCL_API.pdfGoogle Scholar
Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu. 2022. Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System. IEEE Access 10 (2022), 52565--52608. Google ScholarCross Ref
Torsten Hoefler and Roberto Belli. 2015. Scientific Benchmarking of Parallel Computing Systems: Twelve Ways to Tell the Masses When Reporting Performance Results. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, Texas) (SC '15). Association for Computing Machinery, New York, NY, USA, Article 73, 12 pages. Google ScholarDigital Library
Cheol-Ho Hong, Ivor Spence, and Dimitrios S. Nikolopoulos. 2017. GPU Virtualization and Scheduling Methods: A Comprehensive Survey. ACM Comput. Surv. 50, 3, Article 35 (jun 2017), 37 pages. Google ScholarDigital Library
Nina Ihde, Paula Marten, Ahmed Eleliemy, Gabrielle Poerwawinata, Pedro Silva, Ilin Tolovski, Florina M. Ciorba, and Tilmann Rabl. 2022. A Survey of Big Data, High Performance Computing, and Machine Learning Benchmarks. In Performance Evaluation and Benchmarking, Raghunath Nambiar and Meikel Poess (Eds.). Springer International Publishing, Cham, 98--118.Google Scholar
Donghun Lee, Andrew Chang, Minseon Ahn, Jongmin Gim, Jungmin Kim, Jaemin Jung, Kang-Woo Choi, Vincent Pham, Oliver Rebholz, Krishna T. Malladi, and Yang-Seok Ki. 2020. Optimizing Data Movement with Near-Memory Acceleration of In-memory DBMS. In Proceedings of the 23rd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, March 30 - April 02, 2020, Angela Bonifati, Yongluan Zhou, Marcos Antonio Vaz Salles, Alexander Böhm, Dan Olteanu, George H. L. Fletcher, Arijit Khan, and Bin Yang (Eds.). OpenProceedings.org, 371--374. Google ScholarCross Ref
Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, and Pradeep Dubey. 2010. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. SIGARCH Comput. Archit. News 38, 3 (jun 2010), 451--460. Google ScholarDigital Library
Kyprianos Papadimitriou, Apostolos Dollas, and Scott Hauck. 2011. Performance of Partial Reconfiguration in FPGA Systems: A Survey and a Cost Model. ACM Trans. Reconfigurable Technol. Syst. 4, 4, Article 36 (dec 2011), 24 pages. Google ScholarDigital Library
Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2019. Survey and Benchmarking of Machine Learning Accelerators. In 2019 IEEE High Performance Extreme Computing Conference (HPEC). 1--9. Google ScholarCross Ref
Robert Schmid, Max Plauth, Lukas Wenzel, Felix Eberhardt, and Andreas Polze. 2020. Accessible Near-Storage Computing with FPGAs. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys '20). Association for Computing Machinery, New York, NY, USA, Article 28, 12 pages. Google ScholarDigital Library
Janet Tseng, Ren Wang, James Tsai, Yipeng Wang, and Tsung-Yuan Charlie Tai. 2017. Accelerating Open VSwitch with Integrated GPU. In Proceedings of the Workshop on Kernel-Bypass Networks (Los Angeles, CA, USA) (KBNets '17). Association for Computing Machinery, New York, NY, USA, 7--12. Google ScholarDigital Library
Yash Ukidave, Fanny Nina Paravecino, Leiming Yu, Charu Kalra, Amir Momeni, Zhongliang Chen, Nick Materise, Brett Daley, Perhaad Mistry, and David Kaeli. 2015. NUPAR: A Benchmark Suite for Modern GPU Architectures. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (Austin, Texas, USA) (ICPE '15). Association for Computing Machinery, New York, NY, USA, 253--264. Google ScholarDigital Library
UPMEM. 2023. UPMEM SDK. https://sdk.upmem.com/ version 2023.1.0.Google Scholar

Index Terms

A Full-System Perspective on UPMEM Performance
1. Computing methodologies
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures
    2. Memory and dense storage

Recommendations

Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance Computing

Performance characteristics of application programs on large-scale systems are often significantly different from those on smaller systems. SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
Read More
Large System Performance of SPEC OMP2001 Benchmarks
ISHPC '02: Proceedings of the 4th International Symposium on High Performance Computing

Performance characteristics of application programs on large-scale systems are often significantly different from those on smaller systems. SPEC OMP2001 is a benchmark suite intended for measuring performance of modern shared memory parallel systems. ...
Read More
Exploring Processing In-Memory for Different Technologies
GLSVLSI '19: Proceedings of the 2019 on Great Lakes Symposium on VLSI

The recent emergence of IoT has led to a substantial increase in the amount of data processed. Today, a large number of applications are data intensive, involving massive data transfers between processing core and memory. These transfers act as a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DIMES '23: Proceedings of the 1st Workshop on Disruptive Memory Systems
October 2023
64 pages
ISBN:9798400703003
DOI:10.1145/3609308

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
benchmarks
near-memory computing
processing in memory
computational offloading
Qualifiers
- research-article
Conference

Acceptance Rates
DIMES '23 Paper Acceptance Rate8of17submissions,47%Overall Acceptance Rate8of17submissions,47%
More
Upcoming Conference
SOSP '24

Sponsor:

sigops

ACM SIGOPS 29th Symposium on Operating Systems Principles

November 5 - 8, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 164
  Total Downloads
- Downloads (Last 12 months)164
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Full-System Perspective on UPMEM Performance

DIMES '23: Proceedings of the 1st Workshop on Disruptive Memory Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Large System Performance of SPEC OMP2001 Benchmarks

Large System Performance of SPEC OMP2001 Benchmarks

Exploring Processing In-Memory for Different Technologies

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Full-System Perspective on UPMEM Performance

DIMES '23: Proceedings of the 1st Workshop on Disruptive Memory Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Large System Performance of SPEC OMP2001 Benchmarks

Large System Performance of SPEC OMP2001 Benchmarks

Exploring Processing In-Memory for Different Technologies

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media