skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Journal Article · · Journal of Supercomputing
 [1];  [2];  [1];  [3]
  1. New Mexico State Univ., Las Cruces, NM (United States). Klipsch School of Electrical and Computer Engineering
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  3. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE Laboratory Directed Research and Development (LDRD) Program; U.S. Army Research Laboratory (ARL); National Science Foundation (NSF)
Grant/Contract Number:
AC52-06NA25396; W911NF-07-2-0027; AC04-94AL85000
OSTI ID:
1394977
Alternate ID(s):
OSTI ID: 1399561
Report Number(s):
LA-UR-17-24198; SAND-2017-8114J
Journal Information:
Journal of Supercomputing, Vol. 74, Issue 2; ISSN 0920-8542
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

References (40)

Basic block distribution analysis to find periodic behavior and simulation points in applications conference January 2001
Run-time spatial locality detection and optimization conference January 1997
A New Metric to Measure Cache Utilization for HPC Workloads conference January 2016
Hitting the memory wall: implications of the obvious journal March 1995
Quantifying Locality In The Memory Access Patterns of HPC Applications conference January 2005
Performance characterization of the NAS Parallel Benchmarks in OpenCL conference November 2011
Subsetting the SPEC CPU2006 benchmark suite journal March 2007
LMStr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors conference December 2016
Energy, Power, and Performance Characterization of GPGPU Benchmark Programs conference May 2016
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite journal June 2007
The structural simulation toolkit journal March 2011
Pin: building customized program analysis tools with dynamic instrumentation conference January 2005
Scratchpad memory: design alternative for cache on-chip memory in embedded systems conference January 2002
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy conference December 2012
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor journal March 2010
Performance Characterization of SPEC CPU2006 Benchmarks on Intel and AMD Platform conference March 2009
SPEClite: using representative samples to reduce SPEC CPU2000 workload conference January 2001
Predicting whole-program locality through reuse distance analysis journal May 2003
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation conference January 2004
Exploiting spatial locality in data caches using spatial footprints
  • Kumar, S.; Wilkerson, C.
  • ISCA 98: International Symposium on Computer Architecture, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) https://doi.org/10.1109/ISCA.1998.694794
conference January 1998
Towards Performance Predictive Application-Dependent Workload Characterization
  • Alkohlani, Waleed; Cook, Jeanine
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.62
conference November 2012
Evaluation techniques for storage hierarchies journal January 1970
Automatically characterizing large scale program behavior journal December 2002
Cache Utilization as a Locality Metric - A Case Study on the Mantevo Suite
  • Siddique, Nafiul Alam; Grubel, Patricia; Badawy, Abdel-Hameed A.
  • 2016 International Conference on Computational Science and Computational Intelligence (CSCI) https://doi.org/10.1109/CSCI.2016.0110
conference December 2016
A Benchmark Characterization of the EEMBC Benchmark Suite journal September 2009
Controlling cache utilization of HPC applications conference January 2011
LMStr: exploring shared hardware controlled scratchpad memory for multicores conference January 2017
Data analytics workloads: Characterization and similarity analysis
  • Panda, Reena; John, Lizy Kurian
  • 2014 IEEE International Performance Computing and Communications Conference (IPCCC), 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC) https://doi.org/10.1109/PCCC.2014.7017065
conference December 2014
GraphBIG: understanding graph computing in the context of industrial solutions
  • Nai, Lifeng; Xia, Yinglong; Tanase, Ilie G.
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807626
conference January 2015
Measuring benchmark similarity using inherent program characteristics journal June 2006
An Utilization Driven Framework for Energy Efficient Caches book January 2008
Benchmark characterization journal January 1991
The PARSEC benchmark suite: characterization and architectural implications
  • Bienia, Christian; Kumar, Sanjeev; Singh, Jaswinder Pal
  • Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08 https://doi.org/10.1145/1454115.1454128
conference January 2008
Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics conference October 2006
False sharing and spatial locality in multiprocessor caches journal June 1994
Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 Architecture conference October 2006
New tiling techniques to improve cache temporal locality journal May 1999
DAdHTM: Low overhead dynamically adaptive hardware transactional memory for large graphs a scalability study
  • Qayum, Mohammad; Badawy, Abdel-Hameed A.; Cook, Jeanine
  • 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) https://doi.org/10.1109/UIC-ATC.2017.8397653
conference August 2017
The time-varying nature of cache utilization: A case study on the Mantevo and Apex benchmarks
  • Siddique, Nafiul Alam; Grubel, Patricia A.; Badawy, Abdel-Hameed A.
  • 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) https://doi.org/10.1109/UIC-ATC.2017.8397629
conference August 2017
Local memory store (LMStr): A hardware controlled shared scratchpad for multicores
  • Siddique, Nafiul A.; Badawy, Abdel-Hameed A.; Cook, Jeanine
  • 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) https://doi.org/10.1109/UIC-ATC.2017.8397630
conference August 2017

Cited By (1)

Design trade-offs for emerging HPC processors based on mobile market technology journal March 2019