Abstract
Multi-core x86_64 processors introduced an important change in architecture, a shared last level cache. Historically, each processor has had access to a large private cache that seamlessly and transparently (to end users) interfaced with main memory. Previously, processes or threads only had to compete for memory bandwidth, but now they are competing for actual space. Competition for space and environmental resources is a problem studied in other scientific domains. This paper introduces methods from ecology to model multi-core cache usage with the competitive Lotka–Volterra equations. A model is presented and validated for characterizing the interaction of cores through shared caching, and for predicting the degree to which different workloads will interfere with each others’ execution from cache contention.
Similar content being viewed by others
References
Agarwal A (1992) Performance tradeoffs in multithreaded processors. IEEE Trans Parallel Distrib Syst 3(5):525–539
Agarwal A, Hennessy J, Horowitz M (1989) An analytical cache model. ACM Trans Comput Syst 7:184–215
Aho AV, Denning PJ, Ullman JD (1971) Principles of optimal page replacement. J ACM 18:80–93
Antoniou S, Lambropoulou S (2008) Dynamical systems and topological surgery. ArXiv e-prints
Berryman AA (1992) The origins and evolution of predator–prey theory. Ecol Freshw Fish 73:1520–1535
Boyd-Wickizer S, Morris R, Kaashoek MF (2009) Reinventing scheduling for multicore systems. In: Proceedings of the 12th conference on Hot topics in operating systems, HotOS’09. USENIX Association, Berkeley, CA, p 21
Capitn JA, Cuesta JA (2010) Species assembly in model ecosystems, I: analysis of the population model and the invasion dynamics. J Theor Biol 269(1):330–343
Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the 11th international symposium on high-performance computer architecture. IEEE Computer Society, Washington, pp 340–351
Emeneker W, Apon A (2010) Cache effects of virtual machine placement on multi-core processors. International conference on computer and information technology, pp 2261–2266
Emeneker W, Apon A (2012) Characterising the performance of cache-aware placement of virtual machines on a multi-core architecture. Int J Ad Hoc Ubiquitous Comput 10(2):84–95
Fedorova A, Seltzer M, Smith MD (2007) Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of the 16th international conference on parallel architecture and compilation techniques, PACT ’07. IEEE Computer Society, Washington, pp 25–38
Harper JS, Kerbyson DJ, Nudd GR (1999) Analytical modeling of set-associative cache behavior. IEEE Trans Comput 48:1009–1024
Hou Z (2007) Global attractor in competitive Lotka–Volterra systems with retardation. ArXiv e-prints
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95
Jiang Y, Tian K, Shen X (2010) Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In Patt Y, Foglia P, Duesterwald E, Faraboschi P, Martorell X (eds) High performance embedded architectures and compilers, vol 5952 of lecture notes in computer science. Springer, Berlin, pp 201–215
Jones E, Oliphant T, Peterson P et al (2001) SciPy: open source scientific tools for Python (online)
Jost C, Devulder G, Peterson RO, Arditi R (2005) The wolves of Isle Royale display scale-invariant satiation and ratio-dependent predation on moose. J Anim Ecol 74(5):809–816
Kaplan SF, McGeoch LA, Cole MF (2002) Adaptive caching for demand prepaging. SIGPLAN Not 38:114–126
Kaseridis D, Stuecheli J, John LK (2009) Bank-aware dynamic cache partitioning for multicore architectures. In: International conference on parallel processing, pp 18–25
Kessler RE, Hill MD (1992) Page placement algorithms for large real-indexed caches. ACM Trans Comput Syst 10:338–359
Levon J, Elie P (2008) Oprofile: a system-wide Profiler for Linux Systems. http://oprofile.sourceforge.net
Lin J, Lu Q, Ding X, Zhang Z, Zhang X, Sadayappan P (2008) Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: IEEE 14th international symposium on high performance computer architecture, 2008. HPCA 2008, pp 367–378
Malcai O, Biham O, Richmond P, Solomon S (2002) Theoretical analysis and simulations of the generalized Lotka–Volterra model. Phys Rev E 66(3):031102/1–031102/4
Nethercote N, Seward J (2007) Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not 42:89–100
Oden PH, Shedler GS (1972) A model of memory contention in a paging machine. Commun ACM 15:761–771
Odum E (1971) Fundamentals of ecology, 3rd edn. W. B. Saunders Co., Philadelphia
Oliver NA (1974) Experimental data on page replacement algorithm. In: Proceedings of the national computer conference and exposition, AFIPS ’74, ACM, New York, pp 179–184
Petoumenos P, Keramidas G, Zeffer H, Kaxiras S, Hagersten E (2006) Modeling cache sharing on chip multiprocessor architectures. In: IEEE International Symposium on workload characterization, 2006, pp 160–171
Qureshi MK, Patt YN. (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th annual IEEE/ACM international symposium on microarchitecture, MICRO 39. IEEE Computer Society, Washington, pp 423–432
Sainil S, Bailey DH (1996) NAS parallel benchmark (version 1.0) results 11-96, November 1996
Shi X, Su F, Peir J-K, Xia Y, Yang Z (2009) Modeling and stack simulation of CMP cache capacity and accessibility. IEEE Trans Parallel Distrib Syst 20:1752–1763
Smith AJ (1981) Internal scheduling and memory contention. IEEE Trans Softw Eng SE-7(1):135–146
Song F, Moore S, Dongarra J (2007) L2 cache modeling for scientific applications on chip multi-processors. In: International conference on parallel processing, 2007. ICPP 2007, p 51
Suh GE, Devadas S, Rudolph L (2001) Analytical cache models with applications to cache partitioning. In: Proceedings of the 15th international conference on supercomputing, ICS’01. ACM, New York, pp 1–12
Tam D, Azimi R, Stumm M (2007) Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European conference on computer systems 2007, EuroSys ’07. ACM, New York, pp 47–58
Tam DK, Azimi R, Soares LB, Stumm M (2009) RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. SIGPLAN Not 44:121–132
Xue J, Vera X (2004) Efficient and accurate analytical modeling of whole-program data cache behavior. IEEE Trans Comput 53(5):547–566
Zhang X, Dwarkadas S, Shen K (2009) Towards practical page coloring-based multicore cache management. In: Proceedings of the 4th ACM European conference on computer systems, EuroSys ’09. ACM, New York, pp 89–102
Zhang EZ, Jiang Y, Shen X (2010) Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP’10. ACM, New York, pp 203–212
Zhuravlev S, Blagodurov S, Fedorova A (2010) Addressing shared resource contention in multicore processors via scheduling. SIGPLAN Not 45:129–142
Acknowledgments
This work supported in part by NSF grant MRI#0722625. Figures generated with matplotlib (Hunter 2007): http://matplotlib.sf.net The authors thank the anonymous reviewers for their helpful and insightful suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Emeneker, W., Apon, A. On modeling contention for shared caches in multi-core processors with techniques from ecology. Nat Comput 12, 411–428 (2013). https://doi.org/10.1007/s11047-012-9348-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047-012-9348-3