A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Siddique, Nafiul A.; Grubel, Patricia A.; Badawy, Abdel-Hameed A.; Cook, Jeanine

doi:10.1007/s11227-017-2144-1

Title: A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Journal Article · Wed Sep 20 00:00:00 EDT 2017 · Journal of Supercomputing

DOI:https://doi.org/10.1007/s11227-017-2144-1· OSTI ID:1394977

Siddique, Nafiul A. ^[1]; Grubel, Patricia A. ^[2]; Badawy, Abdel-Hameed A. ^[1]; Cook, Jeanine ^[3]

New Mexico State Univ., Las Cruces, NM (United States). Klipsch School of Electrical and Computer Engineering
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic application behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE Laboratory Directed Research and Development (LDRD) Program; U.S. Army Research Laboratory (ARL); National Science Foundation (NSF)

Grant/Contract Number:: AC52-06NA25396; W911NF-07-2-0027; AC04-94AL85000

OSTI ID:: 1394977

Alternate ID(s):: OSTI ID: 1399561

Report Number(s):: LA-UR-17-24198; SAND-2017-8114J

Journal Information:: Journal of Supercomputing, Vol. 74, Issue 2; ISSN 0920-8542

Publisher:: SpringerCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 1 work

Citation information provided by
Web of Science

References (40)

Basic block distribution analysis to find periodic behavior and simulation points in applications Sherwood, T.; Perelman, E.; Calder, B. Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques https://doi.org/10.1109/PACT.2001.953283	conference	January 2001
Run-time spatial locality detection and optimization Johnson, T. L.; Merten, M. C.; Hwu, W. W. Proceedings of 30th Annual International Symposium on Microarchitecture https://doi.org/10.1109/MICRO.1997.645797	conference	January 1997
A New Metric to Measure Cache Utilization for HPC Workloads Deshpande, Aditya M.; Draper, Jeffrey T. Proceedings of the Second International Symposium on Memory Systems - MEMSYS '16 https://doi.org/10.1145/2989081.2989125	conference	January 2016
Hitting the memory wall: implications of the obvious Wulf, Wm. A.; McKee, Sally A. ACM SIGARCH Computer Architecture News, Vol. 23, Issue 1 https://doi.org/10.1145/216585.216588	journal	March 1995
Quantifying Locality In The Memory Access Patterns of HPC Applications Weinberg, J.; McCracken, M. O.; Strohmaier, E. ACM/IEEE SC 2005 Conference (SC'05) https://doi.org/10.1109/SC.2005.59	conference	January 2005
Performance characterization of the NAS Parallel Benchmarks in OpenCL Seo, Sangmin; Jo, Gangwon; Lee, Jaejin 2011 IEEE International Symposium on Workload Characterization (IISWC) https://doi.org/10.1109/IISWC.2011.6114174	conference	November 2011
Subsetting the SPEC CPU2006 benchmark suite Phansalkar, Aashish; Joshi, Ajay; John, Lizy K. ACM SIGARCH Computer Architecture News, Vol. 35, Issue 1 https://doi.org/10.1145/1241601.1241616	journal	March 2007
LMStr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors Siddique, Nafiul Alam; Badawy, Abdel-Hameed A.; Cook, Jeanine 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) https://doi.org/10.1109/PCCC.2016.7820661	conference	December 2016
Energy, Power, and Performance Characterization of GPGPU Benchmark Programs Coplin, Jared; Burtscher, Martin 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2016.164	conference	May 2016
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite Phansalkar, Aashish; Joshi, Ajay; John, Lizy K. ACM SIGARCH Computer Architecture News, Vol. 35, Issue 2 https://doi.org/10.1145/1273440.1250713	journal	June 2007
The structural simulation toolkit Rodrigues, A. F.; CooperBalls, E.; Jacob, B. ACM SIGMETRICS Performance Evaluation Review, Vol. 38, Issue 4 https://doi.org/10.1145/1964218.1964225	journal	March 2011
Pin: building customized program analysis tools with dynamic instrumentation Luk, Chi-Keung; Cohn, Robert; Muth, Robert Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation - PLDI '05 https://doi.org/10.1145/1065010.1065034	conference	January 2005
Scratchpad memory: design alternative for cache on-chip memory in embedded systems Banakar, Rajeshwari; Steinke, Stefan; Lee, Bo-Sik Proceedings of the tenth international symposium on Hardware/software codesign - CODES '02 https://doi.org/10.1145/774789.774805	conference	January 2002
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Kumar, Snehasish; Zhao, Hongzhou; Shriraman, Arrvindh 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) https://doi.org/10.1109/MICRO.2012.42	conference	December 2012
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor Conway, Pat; Kalyanasundharam, Nathan; Donley, Gregg IEEE Micro, Vol. 30, Issue 2 https://doi.org/10.1109/MM.2010.31	journal	March 2010
Performance Characterization of SPEC CPU2006 Benchmarks on Intel and AMD Platform Li, Shengmei; Cheng, Buqi; Gao, Xingyu 2009 First International Workshop on Education Technology and Computer Science https://doi.org/10.1109/ETCS.2009.288	conference	March 2009
SPEClite: using representative samples to reduce SPEC CPU2000 workload Todi, R. Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538) https://doi.org/10.1109/WWC.2001.990740	conference	January 2001
Predicting whole-program locality through reuse distance analysis Ding, Chen; Zhong, Yutao ACM SIGPLAN Notices, Vol. 38, Issue 5 https://doi.org/10.1145/780822.781159	journal	May 2003
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation Patil, H.; Cohn, R.; Charney, M. 37th International Symposium on Microarchitecture (MICRO-37'04) https://doi.org/10.1109/MICRO.2004.28	conference	January 2004
Exploiting spatial locality in data caches using spatial footprints Kumar, S.; Wilkerson, C. ISCA 98: International Symposium on Computer Architecture, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) https://doi.org/10.1109/ISCA.1998.694794	conference	January 1998
Towards Performance Predictive Application-Dependent Workload Characterization Alkohlani, Waleed; Cook, Jeanine 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.62	conference	November 2012
Evaluation techniques for storage hierarchies Mattson, R. L.; Gecsei, J.; Slutz, D. R. IBM Systems Journal, Vol. 9, Issue 2 https://doi.org/10.1147/sj.92.0078	journal	January 1970
Automatically characterizing large scale program behavior Sherwood, Timothy; Perelman, Erez; Hamerly, Greg ACM SIGOPS Operating Systems Review, Vol. 36, Issue 5 https://doi.org/10.1145/635508.605403	journal	December 2002
Cache Utilization as a Locality Metric - A Case Study on the Mantevo Suite Siddique, Nafiul Alam; Grubel, Patricia; Badawy, Abdel-Hameed A. 2016 International Conference on Computational Science and Computational Intelligence (CSCI) https://doi.org/10.1109/CSCI.2016.0110	conference	December 2016
A Benchmark Characterization of the EEMBC Benchmark Suite Poovey, Jason A.; Conte, Thomas M.; Levy, Markus IEEE Micro, Vol. 29, Issue 5 https://doi.org/10.1109/MM.2009.74	journal	September 2009
Controlling cache utilization of HPC applications Perarnau, Swann; Tchiboukdjian, Marc; Huard, Guillaume Proceedings of the international conference on Supercomputing - ICS '11 https://doi.org/10.1145/1995896.1995942	conference	January 2011
LMStr: exploring shared hardware controlled scratchpad memory for multicores Siddique, Nafiul Alam; Badawy, Abdel-Hameed A.; Cook, Jeanine Proceedings of the International Symposium on Memory Systems - MEMSYS '17 https://doi.org/10.1145/3132402.3132440	conference	January 2017
Data analytics workloads: Characterization and similarity analysis Panda, Reena; John, Lizy Kurian 2014 IEEE International Performance Computing and Communications Conference (IPCCC), 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC) https://doi.org/10.1109/PCCC.2014.7017065	conference	December 2014
GraphBIG: understanding graph computing in the context of industrial solutions Nai, Lifeng; Xia, Yinglong; Tanase, Ilie G. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807626	conference	January 2015
Measuring benchmark similarity using inherent program characteristics IEEE Transactions on Computers, Vol. 55, Issue 6 https://doi.org/10.1109/TC.2006.85	journal	June 2006
An Utilization Driven Framework for Energy Efficient Caches Ramaswamy, Subramanian; Yalamanchili, Sudhakar High Performance Computing - HiPC 2008 https://doi.org/10.1007/978-3-540-89894-8_50	book	January 2008
Benchmark characterization Conte, T. M.; Hwu, W. -M. W. Computer, Vol. 24, Issue 1 https://doi.org/10.1109/2.67193	journal	January 1991
The PARSEC benchmark suite: characterization and architectural implications Bienia, Christian; Kumar, Sanjeev; Singh, Jaswinder Pal Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08 https://doi.org/10.1145/1454115.1454128	conference	January 2008
Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics Hoste, Kenneth; Eeckhout, Lieven 2006 IEEE International Symposium on Workload Characterization https://doi.org/10.1109/IISWC.2006.302732	conference	October 2006
False sharing and spatial locality in multiprocessor caches Torrellas, J.; Lam, H. S.; Hennessy, J. L. IEEE Transactions on Computers, Vol. 43, Issue 6 https://doi.org/10.1109/12.286299	journal	June 1994
Performance Characterization of SPEC CPU2006 Integer Benchmarks on x86-64 Architecture Ye, Dong; Ray, Joydeep; Harle, Christophe 2006 IEEE International Symposium on Workload Characterization https://doi.org/10.1109/IISWC.2006.302736	conference	October 2006
New tiling techniques to improve cache temporal locality Song, Yonghong; Li, Zhiyuan ACM SIGPLAN Notices, Vol. 34, Issue 5 https://doi.org/10.1145/301631.301668	journal	May 1999
DAdHTM: Low overhead dynamically adaptive hardware transactional memory for large graphs a scalability study Qayum, Mohammad; Badawy, Abdel-Hameed A.; Cook, Jeanine 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) https://doi.org/10.1109/UIC-ATC.2017.8397653	conference	August 2017
The time-varying nature of cache utilization: A case study on the Mantevo and Apex benchmarks Siddique, Nafiul Alam; Grubel, Patricia A.; Badawy, Abdel-Hameed A. 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) https://doi.org/10.1109/UIC-ATC.2017.8397629	conference	August 2017
Local memory store (LMStr): A hardware controlled shared scratchpad for multicores Siddique, Nafiul A.; Badawy, Abdel-Hameed A.; Cook, Jeanine 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) https://doi.org/10.1109/UIC-ATC.2017.8397630	conference	August 2017

Cited By (1)

Design trade-offs for emerging HPC processors based on mobile market technology Armejach, Adrià; Casas, Marc; Moretó, Miquel The Journal of Supercomputing, Vol. 75, Issue 9 https://doi.org/10.1007/s11227-019-02819-4	journal	March 2019

Similar Records

Dynamic cache bypassing

Patent · Tue Mar 24 00:00:00 EDT 2020 · OSTI ID:1394977

Farmahini-Farahani, Amin; Roberts, David A.

Can high bandwidth and latency justify large cache blocks in scalable multiprocessors?

Conference · Sat Dec 31 00:00:00 EST 1994 · OSTI ID:1394977

Bianchini, R; LeBlanc, T J

Device and method for cache utilization aware data compression

Patent · Tue Nov 17 00:00:00 EST 2020 · OSTI ID:1394977

Das, Shomit N.; Punniyamurthy, Kishore; Tomei, Matthew; +1 more

Related Subjects

97 MATHEMATICS AND COMPUTING
Cache Utilization
Locality
Workload Characterization
Cache Line Utilization
Multicore Cache Simulation
Runtime Evaluation
Scratchpad
Cache utilization
Workload characterization
Cache line utilization
Multicore cache simulation
Runtime evaluation

Title: A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Citation Formats

References (40)

Cited By (1)

Similar Records

Related Subjects