skip to main content
10.1145/3195612.3195616acmotherconferencesArticle/Chapter ViewAbstractPublication Pageshp3cConference Proceedingsconference-collections
research-article

Exhaustive evaluation of memory-latency sensitivity on manycore processors with large cache

Published: 15 March 2018 Publication History

Abstract

The launch of DIMM type 3D XPoint is planned in 2018, and machines that have such devices as large main memory will be commodity in the near future. It is important to evaluate application performance beforehand on those machine configurations, considering the effects of larger main memory latency. The objective of this paper is to propose an accurate and high-throughput evaluation methodology for exhaustive experiments to evaluate with lots of applications with various multidimensional conditions. Also the target architecture is manycore processors such as Xeon Phi KNL and assumes they have large DRAM cache in addition to 3D XPoint main memory. In order to evaluate the latency effects accurately, it is necessary to take stall cycles caused by main memory accesses into account. However, using cycle accurate simulators is too heavy. Instead, we adopt to harness performance counters of processors. However, the current Xeon Phi KNL does not have any performance counters for the stalls. To address this issue, our method integrates measurement results on Xeon Skylake-SP, which have desirable performance counters and close memory system to that of KNL. The paper shows results of exhaustive experiments, which take two days with the proposed method considering arbitrary latency settings. With a cycle accurate simulator, the equivalent experiments would take about 180 years per latency setting.

References

[1]
Intel. 3D Xpoint Technology. http://www.intelsalestraining.com/infographics/memory/3DXPointc.pdf
[2]
Mike Ferron-Jones. A New Breakthrough in Persistent Memory Gets Its First Public Demo. Intel IT peer network May 16, 2017. https://itpeernetwork.intel.com/new-breakthrough-persistent-memory-first-public-demo/
[3]
Tim Verry. Intel Persistent Memory Using 3D XPoint DIMMs Expected Next Year. PC Perspective, May 26, 2017. https://www.pcper.com/news/General-Tech/Intel-Persistent-Memory-Using-3D-XPoint-DIMMs-Expected-Next-Year
[4]
Avadh Patel, Furat Afram, Shunfei Chen, and Kanad Ghose. 2011. MARSS: a full system simulator for multicore x86 CPUs. In Proceedings of the 48th Design Automation Conference (DAC '11). ACM, New York, NY, USA, 1050--1055.
[5]
MARSSx86 - Micro-ARchitectural and System Simulator for x86-based Systems. http://marss86.org/~marss86/index.php/Home,
[6]
Kazi Asifuzzaman, Milan Pavlovic, Milan Radulovic, David Zaragoza, Ohseong Kwon, Kyung-Chang Ryoo, and Petar Radojković. 2016. Performance Impact of a Slower Main Memory: A case study of STT-MRAM in HPC. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS '16). ACM, New York, NY, USA, 40--49.
[7]
Alejandro Rico, Felipe Cabarcas, Carlos Villavieja, Milan Pavlovic, Augusto Vega, Yoav Etsion, Alex Ramirez, and Mateo Valero. 2012. On the simulation of large-scale architectures using multiple application abstraction levels. ACM Trans. Archit. Code Optim. 8, 4, Article 36 (January 2012), 20 pages.
[8]
Haris Volos, Guilherme Magalhaes, Ludmila Cherkasova, and Jun Li. 2015. Quartz: A Lightweight Performance Emulator for Persistent Memory Software. In Proceedings of the 16th Annual Middleware Conference (Middleware '15). ACM, New York, NY, USA, 37--49.
[9]
HewlettPackard. Quartz: A DRAM-based performance emulator for NVM, https://github.com/HewlettPackard/quartz
[10]
Jasmina Malicevic, Subramanya Dulloor, Narayanan Sundaram, Nadathur Satish, Jeff Jackson, and Willy Zwaenepoel. 2015. Exploiting NVM in large-scale graph analytics. In Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads (INFLOW '15). ACM, New York, NY, USA, Article 2, 9 pages.
[11]
Joy Arulraj, Andrew Pavlo, and Subramanya R. Dulloor. 2015. Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 707--722.
[12]
Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys '14). ACM, New York, NY, USA, Article 15, 15 pages.
[13]
Ismail Oukid, Daniel Booss, Wolfgang Lehner, Peter Bumbulis, and Thomas Willhalm. 2014. SOFORT: a hybrid SCM-DRAM storage engine for fast data recovery. In Proceedings of the Tenth International Workshop on Data Management on New Hardware (DaMoN '14), Alfons Kemper and Ippokratis Pandis (Eds.). ACM, New York, NY, USA, Article 8, 7 pages.
[14]
Yiying Zhang, Jian Yang, Amirsaman Memaripour, and Steven Swanson. 2015. Mojim: A Reliable and Highly-Available Non-Volatile Memory System. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 3--18.
[15]
Intel(R) 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide, Part 2, September 2016.
[16]
Intel(R) Xeon Phi(TM) Processor Performance Monitoring Reference Manual Volume 2: Events, March 2017.
[17]
NASA Advanced Supercomputing Division. NAS Parallel Benchmarks (NPB). https://www.nas.nasa.gov/publications/npb.html
[18]
Scott Beamer, David Patterson, Krste Asanovic. GAP benchmark suite. http://gap.cs.berkeley.edu/benchmark.html
[19]
Scott Beamer. GAP benchmark suite. https://github.com/sbeamer/gapbs
[20]
Graph500. https://graph500.org/

Cited By

View all
  • (2024)BlueJay: A Platform to Quantifying the Impact of Memory Latency on Datacenter Application Performance2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00061(489-495)Online publication date: 6-May-2024
  • (2022)LightPCProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527397(289-305)Online publication date: 11-Jun-2022
  • (2019)System Characteristics and Performance Analysis in Multi and Many-core ArchitecturesJournal of Digital Contents Society10.9728/dcs.2019.20.3.59720:3(597-603)Online publication date: 31-Mar-2019

Index Terms

  1. Exhaustive evaluation of memory-latency sensitivity on manycore processors with large cache

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      HP3C: Proceedings of the 2nd International Conference on High Performance Compilation, Computing and Communications
      March 2018
      123 pages
      ISBN:9781450363372
      DOI:10.1145/3195612
      • Conference Chair:
      • Steven Guan
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 March 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. 3D Xpoint
      2. DRAM cache memory
      3. benchmarking
      4. manycore
      5. multithread
      6. nonvolatile memory
      7. performance evaluation

      Qualifiers

      • Research-article

      Conference

      HP3C 2018

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)BlueJay: A Platform to Quantifying the Impact of Memory Latency on Datacenter Application Performance2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00061(489-495)Online publication date: 6-May-2024
      • (2022)LightPCProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527397(289-305)Online publication date: 11-Jun-2022
      • (2019)System Characteristics and Performance Analysis in Multi and Many-core ArchitecturesJournal of Digital Contents Society10.9728/dcs.2019.20.3.59720:3(597-603)Online publication date: 31-Mar-2019

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media