Skip to main content
Log in

Performance Metrics and Models for Shared Cache

  • Survey
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Performance metrics and models are prerequisites for scientific understanding and optimization. This paper introduces a new footprint-based theory and reviews the research in the past four decades leading to the new theory. The review groups the past work into metrics and their models in particular those of the reuse distance, metrics conversion, models of shared cache, performance and optimization, and other related techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhang X, Dwarkadas S, Shen K. Towards practical page coloring-based multicore cache management. In Proc. the EuroSys Conference, April 2009, pp.89-102.

  2. Denning P J. Working sets past and present. IEEE Transactions on Software Engineering, 1980, 6(1): 64-84.

    Article  Google Scholar 

  3. Denning P J. The working set model for program behaviour. Communications of the ACM, 1968, 11(5): 323-333.

    Article  MATH  MathSciNet  Google Scholar 

  4. Brock J, Luo H, Ding C. Locality analysis: A nonillion time window problem. In Proc. Big Data Analytics Workshop, June 2013.

  5. Zhong Y, Shen X, Ding C. Program locality analysis using reuse distance. ACM TOPLAS, 2009, 31(6): 1-39.

    Article  Google Scholar 

  6. Zhong Y, Orlovich M, Shen X, Ding C. Array regrouping and structure splitting using whole-program reference affinity. In Proc. PLDI, June 2004, pp.255-266.

  7. Ding C, Chilimbi T. All-window profiling of concurrent executions. In Proc. the 13th PPoPP (Poster Paper), Feb. 2008, pp.265-266.

  8. Xiang X, Bao B, Bai T, Ding C, Chilimbi T M. All-window profiling and composable models of cache sharing. In Proc. PPoPP, Feb. 2011, pp.91-102.

  9. Xiang X, Bao B, Ding C, Gao Y. Linear-time modeling of program working set in shared cache. In Proc. PACT, Oct. 2011, pp.350-360.

  10. Xiang X, Ding C, Luo H, Bao B. HOTL: A higher order theory of locality. In Proc. ASPLOS, March 2013, pp.343-356.

  11. Xiang X, Bao B, Ding C, Shen K. Cache conscious task regrouping on multicore processors. In Proc. the 12th CCGrid, May 2012, pp.603-611.

  12. Xiang X. A higher order theory of locality and its application in multicore cache management [Ph.D. Thesis]. Computer Science Dept., Univ. of Rochester, 2014.

  13. Wu M, Yeung D. Coherent profiles: Enabling efficient reuse distance analysis of multicore scaling for loop-based parallel programs. In Proc. PACT, Oct. 2011, pp.264-275.

  14. Wu M, Zhao M, Yeung D. Studying multicore processor scaling via reuse distance analysis. In Proc. the 40th ISCA, June 2013, pp.499-510.

  15. Thiébaut D, Stone H S. Footprints in the cache. ACM Transactions on Computer Systems, 1987, 5(4): 305-329.

    Article  Google Scholar 

  16. Suh G E, Devadas S, Rudolph L. Analytical cache models with applications to cache partitioning. In Proc. the 15th ICS, June 2001, pp.1-12.

  17. Chandra D, Guo F, Kim S, Solihin Y. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proc. the 11th HPCA, Feb. 2005, pp.340-351.

  18. Belady L A. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 1966, 5(2): 78-101.

    Article  Google Scholar 

  19. Denning P J. Thrashing: Its causes and prevention. In Proc. AFIPS Fall Joint Computer Conference, Part 1, Dec. 1968, pp.915-922.

  20. Chilimbi T M, Hirzel M. Dynamic hot data stream prefetching for general-purpose programs. In Proc. PLDI, June 2002, pp.199-209.

  21. Mattson R L, Gecsei J, Slutz D, Traiger I L. Evaluation techniques for storage hierarchies. IBM System Journal, 1970, 9(2): 78-117.

    Article  Google Scholar 

  22. Jiang S, Zhang X. LIRS: An efficient low inter-reference recency set replacement to improve buffer cache performance. In Proc. SIGMETRICS, June 2002, pp.31-42.

  23. Smith A J. On the effectiveness of set associative page mapping and its applications in main memory management. In Proc. the 2nd ICSE, Oct. 1976, pp.286-292.

  24. Hill M D, Smith A J. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 1989, 38(12): 1612-1630.

    Article  Google Scholar 

  25. Marin G, Mellor-Crummey J. Cross architecture performance predictions for scientific applications using parameterized models. In Proc. SIGMETRICS, June 2004, pp.2-13.

  26. Snir M, Yu J. On the theory of spatial and temporal locality. Technical Report, DCS-R-2005-2564, Computer Science Dept., Univ. of Illinois at Urbana-Champaign, 2005.

  27. Fang C, Carr S, Önder S, Wang Z. Path-based reuse distance analysis. In Proc. the 15th CC, Mar. 2006, pp.32-46.

  28. Zhong Y, Dropsho S G, Shen X, Studer A, Ding C. Miss rate prediction across program inputs and cache configurations. IEEE Transactions on Computers, 2007, 56(3): 328-343.

    Article  MathSciNet  Google Scholar 

  29. Fang C, Carr S, Önder S, Wang Z. Instruction based memory distance analysis and its application to optimization. In Proc. PACT, Sept. 2005, pp.27-37.

  30. Beyls K, D'Hollander E H. Discovery of locality-improving refactorings by reuse path analysis. In Proc. the 2nd Int. Conf. High Performance Computing and Communications, Sept. 2006, pp.220-229.

  31. Beyls K, D'Hollander E H. Intermediately executed code is the key to find refactorings that improve temporal data locality. In Proc. the 3rd ACM Conference on Computing Frontiers, May 2006, pp.373-382.

  32. Kelly T, Cohen I, Goldszmidt M, Keeton K. Inducing models of black-box storage arrays. Technical Report, HPL-2004-108, HP Laboratories Palo Alto, 2004.

  33. Almeida V, Bestavros A, Crovella M, de Oliveira A. Characterizing reference locality in the WWW. In Proc. the 4th International Conference on Parallel and Distributed Information Systems (PDIS), December 1996, pp.92-103.

  34. Bennett B T, Kruskal V J. LRU stack processing. IBM Journal of Research and Development, 1975, 19(4): 353-357.

    Article  MATH  MathSciNet  Google Scholar 

  35. Olken F. Efficient methods for calculating the success function of fixed space replacement policies. Technical Report, LBL-12370, Lawrence Berkeley Laboratory, 1981.

  36. Ding C, Zhong Y. Predicting whole-program locality through reuse distance analysis. In Proc. PLDI, June 2003, pp.245-257.

  37. Zhong Y, Ding C, Kennedy K. Reuse distance analysis for scientific programs. In Proc. Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, March 2002.

  38. Schuff D L, Kulkarni M, Pai V S. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proc. the 19th PACT, Sept. 2010, pp.53-64.

  39. Kim Y H, Hill M D, Wood D A. Implementing stack simulation for highly-associative memories. In Proc. SIGMETRICS, May 1991, pp.212-213.

  40. Sugumar R A, Abraham S G. Multi-configuration simulation algorithms for the evaluation of computer architecture designs. Technical Report, University of Michigan, August 1993.

  41. Burger D, Austin T. The SimpleScalar tool set, version 2.0. Technical Report, CS-TR-97-1342, Department of Computer Science, University of Wisconsin, June 1997.

  42. Almasi G, Cascaval C, Padua D A. Calculating stack distances efficiently. In Proc. the ACM SIGPLAN Workshop on Memory System Performance, June 2002, pp.37-43.

  43. Denning P J, Schwartz S C. Properties of the working set model. Communications of the ACM, 1972, 15(3): 191-198.

    Article  MATH  MathSciNet  Google Scholar 

  44. Berg E, Hagersten E. StatCache: A probabilistic approach to efficient and accurate data locality analysis. In Proc. ISPASS, March 2004, pp.20-27.

  45. Berg E, Hagersten E. Fast data-locality profiling of native execution. In Proc. SIGMETRICS, June 2005, pp.169-180.

  46. Eklov D, Hagersten E. StatStack: Efficient modeling of LRU caches. In Proc. ISPASS, March 2010, pp.55-65.

  47. Eklov D, Black-Schaffer D, Hagersten E. Fast modeling of shared caches in multicore systems. In Proc. the 6th HiPEAC, Jan. 2011, pp.147-157.

  48. Shen X, Shaw J, Meeker B, Ding C. Locality approximation using time. In Proc. the 34th POPL, Jan. 2007, pp.55-61.

  49. Shen X, Shaw J. Scalable implementation of efficient locality approximation. In Proc. the 21st LCPC Workshop, July 31-August 2, 2008, pp.202-216.

  50. Jiang Y, Zhang E Z, Tian K, Shen X. Is reuse distance applicable to data locality analysis on chip multiprocessors? In Proc. the 19th CC, Mar. 2010, pp.264-282.

  51. Shen X, Shaw J, Meeker B, Ding C. Locality approximation using time. Technical Report, TR 901, Department of Computer Science, University of Rochester, December 2006.

  52. Jiang Y, Tian K, Shen X. Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In Proc. HiPEAC, Jan. 2010, pp.201-215.

  53. West R, Zaroo P, Waldspurger C A, Zhang X. Online cache modeling for commodity multicore processors. Operating Systems Review, 2010, 44(4): 19-29.

    Article  Google Scholar 

  54. Fedorova A, Seltzer M, Smith M D. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proc. the 16th PACT, Sept. 2007, pp.25-38.

  55. Zhou S. An efficient simulation algorithm for cache of random replacement policy. In Proc. the IFIP Int. Conf. Network and Parallel Computing, Sept. 2010, pp.144-154.

  56. Arnold M, Ryder B G. A framework for reducing the cost of instrumented code. In Proc. PLDI, June 2001, pp.168-179.

  57. Hirzel M, Chilimbi T M. Bursty tracing: A framework for low-overhead temporal profiling. In Proc. ACM Workshop on Feedback-Directed and Dynamic Optimization, Dec. 2001.

  58. Cascaval C, Duesterwald E, Sweeney P F, Wisniewski R W. Multiple page size modeling and optimization. In Proc. the 14th PACT, Sept. 2005, pp.339-349.

  59. Zhong Y, Chang W. Sampling-based program locality approximation. In Proc. the 7th ISMM, June 2008, pp.91-100.

  60. Tam D K, Azimi R, Soares L, Stumm M. RapidMRC: Approximating L2 miss rate curves on commodity systems for online optimizations. In Proc. the 14th ASPLOS, Mar. 2009, pp.121-132.

  61. Niu Q, Dinan J, Lu Q, Sadayappan P. PARDA: A fast parallel reuse distance analysis algorithm. In Proc. IPDPS, May 2012.

  62. Cui H, Yi Q, Xue J, Wang L, Yang Y, Feng X. A highly parallel reuse distance analysis algorithm on GPUs. In Proc. the 26th IPDPS, May 2012, pp. 1284-1294.

  63. Gupta S, Xiang P, Yang Y, Zhou H. Locality principle revisited: A probability-Based quantitative approach. In Proc. the 26th IPDPS, May 2012, pp.995-1009.

  64. Moseley T, Shye A, Reddi V J, Grunwald D, Peri R. Shadow profiling: Hiding instrumentation costs with parallelism. In Proc. CGO, March 2007, pp.198-208.

  65. Wallace S, Hazelwood K. Superpin: Parallelizing dynamic instrumentation for real-time performance. In Proc. CGO, Mar. 2007, pp.209-220.

  66. Cascaval C, Padua D A. Estimating cache misses and locality using stack distances. In Proc. the 17th ICS, June 2003, pp.150-159.

  67. Allen R, Kennedy K. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers, 2001.

  68. Beyls K, D'Hollander E H. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 2005, 51(4): 223-250.

    Article  Google Scholar 

  69. Pugh W, Wonnacott D. Eliminating false data dependences using the Omega test. In Proc. PLDI, June 1992, pp.140-151.

  70. Chauhan A, Shei C Y. Static reuse distances for locality-based optimizations in MATLAB. In Proc. the 24th ICS, June 2010, pp.295-304.

  71. Shen X, Gao Y, Ding C et al. Lightweight reference afinity analysis. In Proc. the 19th ICS, June 2005, pp.131-140.

  72. Bao B, Ding C. Defensive loop tiling for shared cache. In Proc. CGO, Feb. 2013, pp.1-11.

  73. Bao B. Peer-aware program optimization [Ph.D. Thesis]. Computer Science Dept., Univ. of Rochester, January 2013.

  74. Yuan L, Ding C, Štefankovič D, Zhang Y. Modeling the locality in graph traversals. In Proc. the 41st ICPP, Sept. 2012, pp.138-147.

  75. Agarwal A, Hennessy J L, Horowitz M. Cache performance of operating system and multiprogramming workloads. ACM Transactions on Computer Systems, 1988, 6(4): 393-431.

    Article  Google Scholar 

  76. Ding C, Chilimbi T. A composable model for analyzing locality of multi-threaded programs. Technical Report, MSR-TR-2009-107, Microsoft Research, August 2009.

  77. Strohmaier E, Shan H. APEX-Map: A parameterized scalable memory access probe for high-performance computing systems. Concurrency and Computation: Practice and Experience, 2007, 19(17): 2185-2205.

    Article  Google Scholar 

  78. Ibrahim K Z, Strohmaier E. Characterizing the relation between Apex-Map synthetic probes and reuse distance distributions. In Proc. ICPP, Sept. 2010, pp.353-362.

  79. He L, Yu Z, Jin H. FractalMRC: Online cache miss rate curve prediction on commodity systems. In Proc. IPDPS, May 2012, pp.1341-1351.

  80. Saltzer J H. A simple linear model of demand paging performance. Communications of the ACM, 1974, 17(4): 181-186.

    Article  Google Scholar 

  81. Strecker W D. Transient behavior of cache memories. ACM Transactions on Computer Systems, 1983, 1(4): 281-293.

    Article  Google Scholar 

  82. King W F. Analysis of demand paging algorithms. In Proc. IFIP Congress, August 1971, pp.485-490.

  83. Fagin R, Price T G. Efficient calculation of expected miss ratios in the independent reference model. SIAM Journal of Computing, 1978, 7(3): 288-297.

    Article  MATH  MathSciNet  Google Scholar 

  84. Dan A, Towsley D F. An approximate analysis of the LRU and FIFO buffer replacement schemes. In Proc. SIGMETRICS, May 1990, pp.143-152.

  85. Gu X, Ding C. Reuse distance distribution in random access. Technical Report, URCS #930, University of Rochester, January 2008.

  86. Denning P J, Slutz D R. Generalized working sets for segment reference strings. Communications of the ACM, 1978, 21(9): 750-759.

    Article  Google Scholar 

  87. Easton M C, Fagin R. Cold-start vs. warm-start miss ratios. Communications of the ACM, 1978, 21(10): 866-872.

    Article  MATH  Google Scholar 

  88. Shedler G, Tung C. Locality in page reference strings. SIAM Journal on Computing, 1972, 1(3): 218-241.

    Article  MATH  Google Scholar 

  89. Stone H S, Turek J, Wolf J L. Optimal partitioning of cache memory. IEEE Transactions on Computers, 1992, 41(9): 1054-1068.

    Article  Google Scholar 

  90. Thiébaut D, Stone H S, Wolf J L. Improving disk cache hit-ratios through cache partitioning. IEEE Transactions on Computers, 1992, 41(6): 665-676.

    Article  Google Scholar 

  91. Falsafi B, Wood D A. Modeling cost/performance of a parallel computer simulator. ACM Transactions on Modeling and Computer Simulation, 1997, 7(1): 104-130.

    Article  Google Scholar 

  92. Wu M J, Yeung D. Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis. In Proc. the ACM SIGPLAN Workshop on Memory System Performance and Correctness, June 2012, pp.2-11.

  93. Fedorova A, Blagodurov S, Zhuravlev S. Managing contention for shared resources on multicore processors. Communications of the ACM, 2010, 53(2): 49-57.

    Article  Google Scholar 

  94. Zhuravlev S, Blagodurov S, Fedorova A. Addressing shared resource contention in multicore processors via scheduling. In Proc. ASPLOS, March 2010, pp.129-142.

  95. Blagodurov S, Zhuravlev S, Fedorova A. Contention-aware scheduling on multicore systems. ACM Transactions on Computer Systems, 2010, 28(4): Article No.8.

  96. Chen X E, Aamodt T M. A first-order fine-grained multi-threaded throughput model. In Proc. HPCA, Feb. 2009, pp.329-340.

  97. Xie Y, Loh G H. Dynamic classification of program memory behaviors in CMPs. In Proc. CMP-MSI Workshop, June 2008.

  98. Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach (4th edition). Morgan Kaufmann, 2006.

  99. Sun X H, Wang D. APC: A performance metric of memory systems. ACM SIGMETRICS Performance Evaluation Review, 2012, 40(2): 125-130.

    Article  Google Scholar 

  100. Zhao J, Feng X, Cui H et al. An empirical model for predicting cross-core performance interference on multicore processors. In Proc. PACT, Sept. 2013, pp.201-212.

  101. Wang W, Dey T, Davidson J W et al. DraMon: Predicting memory bandwidth usage of multi-threaded programs with high accuracy and low overhead. In Proc. HPCA, Feb. 2014.

  102. Kim M, Kumar P, Kim H, Brett B. Predicting potential speedup of serial code via lightweight profiling and emulations with memory performance model. In Proc. IPDPS, May 2012, pp.1318-1329.

  103. Zhang X, Zhong R, Dwarkadas S, Shen K. A flexible framework for throttling-enabled multicore management (TEMM). In Proc. ICPP, Sept. 2012, pp.389-398.

  104. Liu L, Cui Z, Xing M et al. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proc. PACT, Sept. 2012, pp.367-376.

  105. Jiang Y, Tian K, Shen X, Zhang J, Chen J, Tripathi R. The complexity of optimal job co-scheduling on chip multiprocessors and heuristics-based solutions. IEEE Trans. Parallel and Distributed Systems, 2011, 22(7): 1192-1205.

    Article  Google Scholar 

  106. Jiang Y, Shen X, Chen J, Tripathi R. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proc. PACT, Oct. 2008, pp.220-229.

  107. Snavely A, Tullsen D M. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proc. ASPLOS, Nov. 2000, pp.234-244.

  108. Shen K. Request behavior variations. In Proc. ASPLOS, Mar. 2010, pp.103-116.

  109. Knauerhase R, Brett P, Hohlt B, Li T, Hahn S. Using OS observations to improve performance in multicore systems. IEEE Micro, 2008, 38(3): 54-66.

    Article  Google Scholar 

  110. Denning P J. Equipment configuration in balanced computer systems. IEEE Transactions on Computers, 1969, C-18(11): 1008-1012.

    Article  Google Scholar 

  111. Wulf W A. Performance monitors for multi-programming systems. In Proc. the ACM Symposium on Operating System Principles, Oct. 1969, pp.175-181.

  112. Mars J, Tang L, Skadron K, Soffa M L, Hundt R. Increasing utilization in modern warehouse-scale computers using bubble-up. IEEE Micro, 2012, 32(3): 88-99.

    Article  Google Scholar 

  113. Delimitrou C, Kozyrakis C. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proc. ASPLOS, March 2013, pp.77-88.

  114. Ahn D H, Vetter J S. Scalable analysis techniques for micro-processor performance counter metrics. In Proc. ACM/IEEE Conf. Supercomputing, Nov. 2002.

  115. Rodríguez G, Badia R M, Labarta J. Generation of simple analytical models for message passing applications. In Proc. Euro-Par., Aug. 31-Sept. 3, 2004, pp.183-188.

  116. Jacquet A, Janot V, Leung C et al. An executable analytical performance evaluation approach for early performance prediction. In Proc. IPDPS, April 2003.

  117. Miller B P, Callaghan M D, Cargille J M et al. The Paradyn parallel performance measurement tool. IEEE Computer, 1995, 28(11): 37-46.

    Article  Google Scholar 

  118. Kerbyson D J, Hoisie A, Wasserman H J. Modelling the performance of large-scale systems. IEE Proceedings Software, 2003, 150(4): 214-222.

    Article  Google Scholar 

  119. Wall D W. Predicting program behavior using real or estimated profiles. In Proc. PLDI, June 1991, pp.59-70.

  120. Tian K, Jiang Y, Zhang E Z, Shen X. An input-centric paradigm for program dynamic optimizations. In Proc. OOP-SLA, Oct. 2010, pp.125-139.

  121. Shen X, Zhong Y, Ding C. Regression-based multi-model prediction of data reuse signature. In Proc. the 4th Annual Symposium of the Los Alamos Computer Science Institute, Oct. 2003.

  122. Marin G, Mellor-Crummey J. Scalable cross-architecture predictions of memory hierarchy response for scientific applications. In Proc. the Symposium of the Los Alamos Computer Science Institute, Oct. 2005.

  123. Shen X, Ding C. Parallelization of utility programs based on behavior phase analysis. In Proc. the International Workshop on Languages and Compilers for Parallel Computing, Oct. 2005, pp.425-432.

  124. Shen X, Zhong Y, Ding C. Locality phase prediction. In Proc. ASPLOS, Oct. 2004, pp.165-176.

  125. Shen X, Zhong Y, Ding C. Predicting locality phases for dynamic memory optimization. Journal of Parallel and Distributed Computing, 2007, 67(7): 783-796.

    Article  MATH  Google Scholar 

  126. Mao F, Shen X. Cross-input learning and discriminative prediction in evolvable virtual machines. In Proc. CGO, Mar. 2009, pp.92-101.

  127. Jiang Y, Zhang E Z, Tian K et al. Exploiting statistical correlations for proactive prediction of program behaviors. In Proc. the 8th CGO, April 2010, pp.248-256.

  128. Cavazos J, Moss J E B. Inducing heuristics to decide whether to schedule. In Proc. PLDI, June 2004, pp.183-194.

  129. Wu B, Zhao Z, Shen X, Jiang Y, Gao Y, Silvera R. Exploiting inter-sequence correlations for program behavior prediction. In Proc. OOPSLA, Oct. 2012, pp.851-866.

  130. Arnold M, Welc A, Rajan V T. Improving virtual machine performance using a cross-run profile repository. In Proc. OOPSLA, Oct. 2005, pp.297-311.

  131. Tian K, Zhang E Z, Shen X. A step towards transparent integration of input-consciousness into dynamic program optimizations. In Proc. OOPSLA, Oct. 2011, pp.445-462.

  132. Chen Y, Huang Y, Eeckhout L et al. Evaluating iterative optimization across 1000 datasets. In Proc. PLDI, June 2010, pp.448-459.

  133. Wu B, Zhou M, Shen X et al. Simple profile rectifications go a long way – Statistically exploring and alleviating the effects of sampling errors for program optimizations. In Proc. the European Conference on Object-Oriented Programming, July 2013, pp.654-678.

  134. Srivastava A, Eustace A. ATOM: A system for building customized program analysis tools. In Proc. PLDI, June 1994, pp.196-205.

  135. Luk C, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi V J, Hazelwood K. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. PLDI, June 2005, pp.190-200.

  136. Wagner Meira Jr., LeBlanc T, Poulos A. Waiting time analysis and performance visualization in Carnival. In Proc. ACM SIGMETRICS Symposium on Parallel and Distributed Tools, May 1996.

  137. Reed D A, Elford C L, Madhyastha T M, Smirni E, Lamm S E. The next frontier: Interactive and closed loop performance steering. In Proc. ICPP Workshop, Aug. 1996, pp.20-31.

  138. Darema-Rogers F, Pfister G F, So K. Memory access patterns of parallel scientific programs. In Proc. SIGMETRICS, May 1987, pp.46-58.

  139. Browne S, Dongarra J, Garner N, Ho G, Mucci P. A portable programming interface for performance evaluation on modern processors. The International Journal of High Performance Computing Applications, 2000, 14(3): 189-204.

    Article  Google Scholar 

  140. Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent N R. HPCTOOLKIT: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 2010, 22(6): 685-701.

    Google Scholar 

  141. Shende S, Malony A D. The TAU parallel performance system. International Journal of High Performance Computing Applications, 2006, 20(2): 287-311.

    Article  Google Scholar 

  142. Schulz M, Galarowicz J, Maghrak D, Hachfeld W, Montoya D, Cranford S. Open|SpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming, 2008, 16(2/3): 105-121.

    Google Scholar 

  143. Hauswirth M, Sweeney P F, Diwan A. Temporal vertical profiling. Software: Practice and Experience, 2010, 40(8): 627-654.

    Google Scholar 

  144. Childers B, Davidson J, Soffa M L. Continuous compilation: A new approach to aggressive and adaptive code transformation. In Proc. Symp. Parallel and Distributed Processing, April 2003.

  145. Cascaval C, Duesterwald E, Sweeney P F, Wisniewski R W. Performance and environment monitoring for continuous program optimization. IBM Journal of Research and Development, 2006, 50(2/3): 239-248.

    Article  Google Scholar 

  146. McCurdy C, Vetter J S. Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms. In Proc. ISPASS, March 2010, pp.87-96.

  147. Liu X, Mellor-Crummey J M. Pinpointing data locality problems using data-centric analysis. In Proc. the 9th CGO, April 2011, pp.171-180.

  148. Liu X, Mellor-Crummey J. A tool to analyze the performance of multithreaded programs on NUMA architectures. In Proc. the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb. 2014, pp.259-272.

  149. Zhuang X, Serrano M J, Cain H W, Choi J. Accurate, efficient, and adaptive calling context profiling. In Proc. PLDI, June 2006, pp.263-271.

  150. Ding C, Yuan L. Program interaction on multicore: Theory and applications. Computer Engineering and Science, 2014, 36(1): 1-5. (In Chinese)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Ding.

Additional information

The work is partially supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61232008, the NSFC Joint Research Fund for Overseas Chinese Scholars and Scholars in Hong Kong and Macao under Grant No. 61328201, the National Science Foundation of USA under Contract Nos. CNS-1319617, CCF-1116104, CCF-0963759, an IBM CAS Faculty Fellowship and a research grant from Huawei. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding organizations.

Xiang has graduated and is now working at Twitter Inc. Bao has graduated and is now working at Quacomm Inc.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 84 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, C., Xiang, X., Bao, B. et al. Performance Metrics and Models for Shared Cache. J. Comput. Sci. Technol. 29, 692–712 (2014). https://doi.org/10.1007/s11390-014-1460-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1460-7

Keywords

Navigation