ABSTRACT
Full characterization of the performance of a new memory technology is typically a subtle process because of the difficulty in subjecting the memory to different access patterns before creating a full system. Simple performance characterization, such as raw bandwidth, does not give enough information about the suitability of the memory for different architectural design choices, such as suitability for processing in memory, performance reliance on relaxed ordering semantic, or how to implement atomics, etc.
This paper discusses the use of the ApexMAP synthetic benchmarks to assess the Hybrid Memory Cube (HMC) technology. ApexMAP, through a simple model for spatial and temporal locality, allows creating many application probes that could be used to subject the memory to different access patterns. We use a Verilog implementation of ApexMAP to show the impact of contending requests, flow control, and access granularity on the HMC performance. We show a wide variation (up to 20×) in the observed performance based on the application locality parameters and the HMC architectural configurations.
- G. Almási, C. Caşcaval, and D. A. Padua. Calculating Stack Distances Efficiently. SIGPLAN Workshop on Memory System Performance, 38(2):37--43, 2003. Google ScholarDigital Library
- C. Caşcaval and D. A. Padua. Estimating Cache Misses and Locality Using Stack Distances. The 17th annual international conference on Supercomputing (ICS'03), pages 150--159, June 2003. Google ScholarDigital Library
- D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. SIGPLAN Not., 28(7):1--12, 1993. Google ScholarDigital Library
- C. Ding and Y. Zhong. Predicting Whole-program Locality Through Reuse Distance Analysis. The ACM SIGPLAN 2003 conference on Programming language design and implementation, 38(5):245--257, 2003. Google ScholarDigital Library
- L. Eeckhout and K. D. Bosschere. Hybrid Analytical-Statistical Modeling for Efficiently Exploring Architecture and Workload Design Spaces. The 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT '01), page 25, 2001. Google ScholarDigital Library
- S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A Mechanistic Performance Model for Superscalar Out-of-order Processors. ACM Trans. Comput. Syst., 27(2):1--37, 2009. Google ScholarDigital Library
- M. Gokhale, S. Lloyd, and C. Macaraeg. Hybrid memory cube performance characterization on data-centric workloads. The 5th Workshop on Irregular Applications: Architectures and Algorithms, pages 7:1--7:8, 2015. Google ScholarDigital Library
- K. Z. Ibrahim and E. Strohmaier. Characterizing the Relation Between Apex-Map Synthetic Probes and Reuse Distance Distributions. The 39th International Conference on Parallel Processing, pages 353--362, Sept 2010. Google ScholarDigital Library
- H. Kim, H. Kim, S. Yalamanchili, and A. F. Rodrigues. Understanding Energy Aspects of Processing-near-Memory for HPC Workloads. The 2015 International Symposium on Memory Systems(MEMSYS '15), pages 276--282, 2015. Google ScholarDigital Library
- J. D. Leidel and Y. Chen. Hmc-sim: A simulation framework for hybrid memory cube devices. The IEEE International Parallel & Distributed Processing Symposium Workshops, pages 1465--1474, 2014. Google ScholarDigital Library
- LINPACK Benchmark. http://www.netlib.org/linpack/.Google Scholar
- R. Mattson, L. S. Gecsei, D. R., and I. Traiger. Evaluation techniques for storage hierarchies. IBM Systems, 2(1. 9):78--117, 1970. Google ScholarDigital Library
- Pico Computing. EX-800: Blade Server with Hybrid Memory Cube, 12 2013.Google Scholar
- SPEC OMP (OpenMP Benchmark Suite) V3.2. http://www.spec.org/omp/.Google Scholar
- Standard Performance Evaluation Corporation. http://www.spec.org.Google Scholar
- E. Strohmaier and H. Shan. Apex-Map: A Synthetic Scalable Benchmark Probe to Explore Data Access Performance on Highly Parallel Systems. Euro-Par 2005 Parallel Processing, pages 114--123, 2005. Google ScholarDigital Library
- E. Strohmaier and H. Shan. Apex-map: A global data access benchmark to analyze hpc systems and parallel programming paradigms. The 2005 ACM/IEEE conference on Supercomputing, pages 49--, 2005. Google ScholarDigital Library
- E. Strohmaier and H. Shan. APEX-Map: a Parameterized Scalable Memory Access Probe for High-performance Computing Systems. Concurr. Comput.: Pract. Exper., 19(17):2185--2205, 2007. Google ScholarDigital Library
- A. Suresh, P. Cicotti, and L. Carrington. Evaluation of emerging memory technologies for HPC, data intensive applications. The 2014 IEEE International Conference on Cluster Computing (CLUSTER), pages 239--247, Sept 2014.Google ScholarCross Ref
- The Graph 500 List. http://www.graph500.org/.Google Scholar
- M. M. Tikir, L. Carrington, E. Strohmaier, and A. Snavely. A Genetic Algorithms Approach to Modeling the Performance of Memory-bound Computations. The 2007 ACM/IEEE conference on Supercomputing, pages 1--12, 2007. Google ScholarDigital Library
- Top 500 supercomputers. http://www.top500.org/.Google Scholar
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. The 22nd annual international symposium on Computer architecture (ISCA '95), pages 24--36, 1995. Google ScholarDigital Library
- Y. Zhong, X. Shen, and C. Ding. Program Locality Analysis Using Reuse Distance. ACM Trans. Program. Lang. Syst., 31(6):1--39, 2009. Google ScholarDigital Library
- Characterizing the Performance of Hybrid Memory Cube Using ApexMAP Application Probes
Recommendations
Memory Coalescing for Hybrid Memory Cube
ICPP '18: Proceedings of the 47th International Conference on Parallel ProcessingArguably, many data-intensive applications pose significant challenges to conventional architectures and memory systems, especially when applications exhibit non-contiguous, irregular, and small memory access patterns. The long memory access latency can ...
Hybrid memory cube performance characterization on data-centric workloads
IA3 '15: Proceedings of the 5th Workshop on Irregular Applications: Architectures and AlgorithmsThe Hybrid Memory Cube is an early commercial product embodying attributes of future stacked DRAM architectures, namely large capacity, high bandwidth, on-package memory controller, and high speed serial interface. We study the performance and energy of ...
Characterizing Memory Write References for Efficient Management of Hybrid PCM and DRAM Memory
MASCOTS '11: Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication SystemsIn order to reduce the energy dissipation in main memory of computer systems, phase change memory (PCM) has emerged as one of the most promising technologies to incorporate into the memory hierarchy. However, PCM has two critical weaknesses to ...
Comments