A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC
- New Mexico State Univ., Las Cruces, NM (United States). Klipsch School of Electrical and Computer Engineering
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE Laboratory Directed Research and Development (LDRD) Program; U.S. Army Research Laboratory (ARL); National Science Foundation (NSF)
- Grant/Contract Number:
- AC52-06NA25396; W911NF-07-2-0027; AC04-94AL85000
- OSTI ID:
- 1394977
- Alternate ID(s):
- OSTI ID: 1399561
- Report Number(s):
- LA-UR-17-24198; SAND-2017-8114J
- Journal Information:
- Journal of Supercomputing, Vol. 74, Issue 2; ISSN 0920-8542
- Publisher:
- SpringerCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
Basic block distribution analysis to find periodic behavior and simulation points in applications
|
conference | January 2001 |
Run-time spatial locality detection and optimization
|
conference | January 1997 |
A New Metric to Measure Cache Utilization for HPC Workloads
|
conference | January 2016 |
Hitting the memory wall: implications of the obvious
|
journal | March 1995 |
Quantifying Locality In The Memory Access Patterns of HPC Applications
|
conference | January 2005 |
Performance characterization of the NAS Parallel Benchmarks in OpenCL
|
conference | November 2011 |
Subsetting the SPEC CPU2006 benchmark suite
|
journal | March 2007 |
LMStr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors
|
conference | December 2016 |
Energy, Power, and Performance Characterization of GPGPU Benchmark Programs
|
conference | May 2016 |
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
|
journal | June 2007 |
The structural simulation toolkit
|
journal | March 2011 |
Pin: building customized program analysis tools with dynamic instrumentation
|
conference | January 2005 |
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
|
conference | January 2002 |
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
|
conference | December 2012 |
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor
|
journal | March 2010 |
Performance Characterization of SPEC CPU2006 Benchmarks on Intel and AMD Platform
|
conference | March 2009 |
SPEClite: using representative samples to reduce SPEC CPU2000 workload
|
conference | January 2001 |
Predicting whole-program locality through reuse distance analysis
|
journal | May 2003 |
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
|
conference | January 2004 |
Exploiting spatial locality in data caches using spatial footprints
|
conference | January 1998 |
Towards Performance Predictive Application-Dependent Workload Characterization
|
conference | November 2012 |
Evaluation techniques for storage hierarchies
|
journal | January 1970 |
Automatically characterizing large scale program behavior
|
journal | December 2002 |
Cache Utilization as a Locality Metric - A Case Study on the Mantevo Suite
|
conference | December 2016 |
A Benchmark Characterization of the EEMBC Benchmark Suite
|
journal | September 2009 |
Controlling cache utilization of HPC applications
|
conference | January 2011 |
LMStr: exploring shared hardware controlled scratchpad memory for multicores
|
conference | January 2017 |
Data analytics workloads: Characterization and similarity analysis
|
conference | December 2014 |
GraphBIG: understanding graph computing in the context of industrial solutions
|
conference | January 2015 |
Measuring benchmark similarity using inherent program characteristics
|
journal | June 2006 |
An Utilization Driven Framework for Energy Efficient Caches
|
book | January 2008 |
Benchmark characterization
|
journal | January 1991 |
The PARSEC benchmark suite: characterization and architectural implications
|
conference | January 2008 |
Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics
|
conference | October 2006 |
False sharing and spatial locality in multiprocessor caches
|
journal | June 1994 |
Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 Architecture
|
conference | October 2006 |
New tiling techniques to improve cache temporal locality
|
journal | May 1999 |
DAdHTM: Low overhead dynamically adaptive hardware transactional memory for large graphs a scalability study
|
conference | August 2017 |
The time-varying nature of cache utilization: A case study on the Mantevo and Apex benchmarks
|
conference | August 2017 |
Local memory store (LMStr): A hardware controlled shared scratchpad for multicores
|
conference | August 2017 |
Design trade-offs for emerging HPC processors based on mobile market technology
|
journal | March 2019 |
Similar Records
Can high bandwidth and latency justify large cache blocks in scalable multiprocessors?
Device and method for cache utilization aware data compression