A workload independent energy reduction strategy for D-NUCA caches

Foglia, Pierfrancesco; Comparetti, Manuel

doi:10.1007/s11227-013-1033-5

A workload independent energy reduction strategy for D-NUCA caches

Published: 24 October 2013

Volume 68, pages 157–182, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Pierfrancesco Foglia¹ &
Manuel Comparetti²

277 Accesses
11 Citations
Explore all metrics

Abstract

Wire delays and leakage energy consumption are both growing problems in the design of large on chip caches built in deep submicron technologies. D-NUCA caches (Dynamic-Nonuniform Cache Architecture) exploit an aggressive subbanking of the cache and a migration mechanism to speed up frequently accessed data access latency, to limit wire delays effects on performances. Way Adaptable D-NUCA is a leakage power reduction technique specifically suited for D-NUCA caches. It dynamically varies the portion of the powered-on cache area based on the running workload caching needs, but it relies on application dependent parameters that must be evaluated off-line. This limits the effectiveness of Way Adaptable D-NUCA in the general purpose, multiprogrammed environment. In this paper, we propose a new power reduction technique for D-NUCA caches, which still adapts the powered-on cache area to the needs of the running workload, but it does not rely on application-dependent parameters. Results show that our proposal saves around 49 % of total cache energy consumption in a single core environment and 44 % in CMP environment. By adding a timer, it performs similarly to previously proposed techniques to reduce leakage power consumptions, and outperforms them when they are applied in a workload independent manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proc 10th ASPLOS, San Jose, CA, USA, Oct 2002, pp 211–222
Google Scholar
Bardine A, Comparetti M, Foglia P, Gabrielli G, Prete CA (2010) Way-adaptable D-NUCA caches. Int J High Perform Syst Archit 2(3/4):215–228
Article Google Scholar
Standard Performance Evaluation Corporation (2000) Available: http://www.spec.org/cpu2000/
Bailey DH, Barszcz E et al (1991) The NAS parallel benchmarks—summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE conference on supercomputing. ACM, New York, pp 158–165. Available http://www.nas.nasa.gov/Resources/Software/npb.html
Chapter Google Scholar
Powell M, Yangh S, Falsafi B, Roy K, Vijaykumar TN (2000) Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories. In: Proc int symp low power electronics and design, Rapallo, Italy, July 2000, pp 90–95
Google Scholar
Desikan R et al (2001) Sim-Alpha: a validated execution-driven alpha2164 simulator. Tech Report TR-01-23, Dept of Computer Sciences, Univ Texas at Austin
Muralimanohar N, Balasubramonian R, Jouppi N (2009) CACTI 6.0: a tool to model large caches. HP Tech Rep, HPL-2009-85, April 2009
Snavely A, Tullsen DM (2000) Symbiotic jobscheduling for a simultaneous multithreading processor. In: Proc of the 9th ASPLOS, Cambridge, MA, Nov 2000, pp 234–244
Google Scholar
Chisti Z, Powell MD, Vijaykumar TN (2003) Distance associativity for high-performance energy-efficient non-uniform cache architectures. In: Proc 36th int symp on microarchitecture, San Diego, CA, Dec 2003, pp 55–66
Google Scholar
Foglia P, Mangano D, Prete CA (2005) A NUCA model for embedded systems cache design. In: IEEE 2005 workshop on embedded systems for real-time multimedia (ESTIMEDIA), New York Metropolitan Area, USA, September 2005, pp 41–46
Chapter Google Scholar
Huh J, Kim C, Shafi H, Zhang L, Bourger D, Keckler SW (2005) A NUCA substrate for flexible CMP cache sharing. In: Proc of the 19th ICS, Cambridge, MA, 20–22 June 2005
Google Scholar
Beckmann BM, Wood DA (2003) Managing wire delay in large chip-multiprocessors caches. In: Proc of 37th int symp on microarchitecture, San Diego, CA, Dec 2003, pp 55–66
Google Scholar
Annoni A et al (2012) A real-time configurable NURBS interpolator with bounded acceleration, Jerk and Chord error. Comput Aided Des 44(6):509–521. doi:10.1016/j.cad.2012.01.009
Article Google Scholar
Bardine A et al (2009) Impact of on-chip network parameters on NUCA cache performance. IET Comput Digit Tech 3(5):501–512. doi:10.1049/ietcdt.2008.0078
Article Google Scholar
Bardine A, Foglia P, Gabrielli G, Prete CA (2007) Analysis of static and dynamic energy consumption in NUCA caches: initial results. In: Proc of the MEDEA 2007 workshop, Brasov, Romania, Sep 2007, pp 105–112
Google Scholar
Venkatachalam V, Franz M (2005) Power reduction techniques for microprocessor systems. ACM Comput Surv 37(3):195–237
Article Google Scholar
Albonesi DH (1999) Selective cache ways: on-demand cache resource allocation. In: Proc 32nd int symp on microarchitecture, Israel, Nov 1999, pp 248–259
Google Scholar
Balasubramonian R et al (2000) Memory hierarchy reconfiguration for energy and performance in general purpose processor architectures. In: Proc 33rd int symp on microarchitecture, Monterey, CA, Dec 2000, pp 245–257
Google Scholar
Bardine A et al (2013) Evaluation of leakage reduction alternatives for deep sub-micron D-NUCA caches. IEEE Trans Very Large Scale Integr (VLSI) Syst. doi:10.1109/TVLSI.2012.2231949, published on-line Feb 2013
Google Scholar
Hanson H et al (2003) Static energy reduction techniques for microprocessor caches. IEEE Trans Very Large Scale Integr (VLSI) Syst 11(3):303–313
Article MathSciNet Google Scholar
Flautner K, Kim NS, Blaauw SMD, Mudge T (2002) Drowsy caches: simple techniques for reducing leakage power. In: Proc 29th ISCA, Anchorage, AK, May 2002, pp 148–157
Google Scholar
Mohyuddin N, Bhatti R, Dubois M (2005) Controlling leakage power with the replacement policy in slumberous cache. In: Proc 2nd conf on computing frontiers, Ischia, Italy, May 2005, pp 161–170
Google Scholar
Hu Z, Kaxiras S, Martonosi M (2002) Let caches decay: reducing leakage energy via exploitation of cache generational behavior. ACM Trans Comput Syst 20(2):161–190
Article Google Scholar
Eyerman S, Eeckhout L (2008) System-level performance metrics for multiprogram workloads. IEEE MICRO 28(3):42–53
Article Google Scholar
Kumar R, Hinton G (2009) A family of 45 nm IA processors. In: Proceedings of the 56th international solid state circuits conference (ISSCC), February 2009
Google Scholar
Kurd NA, Bhamidipati S, Mozak C et al (2010) A family of 32 nm IA processors. IEEE J Solid-State Circuits 46(1):119–130
Article Google Scholar
Agny R, DeLano E, Kumar M, Nachimutu M, Shiveley R (2010) The Intel Itanium processor 9300 series. Intel White Paper
Horowitz M, Indermaur T, Gonzales R (1994) Low-power digital design. In: Proc IEEE symposium on low power electronics, pp 8–11
Google Scholar
Foglia P, Panicucci F, Prete CA, Solinas M (2009) Analysis of performance dependencies in NUCA-based CMP systems. In: 21st int symp on computer architecture and high performance computing, Sao Paulo, Brazil, 28–31 October 2009, pp 49–56
Google Scholar
Kotera I, Egawa R, Takizawa H, Kobayashi H (2008) Modeling of cache access behavior based on Zipf’s law. In: Proc of 9th MEDEA workshop, Toronto, Canada, October 2008, pp 9–15
Google Scholar
Kobayashi H, Kotera I, Takizawa H (2004) Locality analysis to control dynamically way-adaptable caches. Comput Archit News 33(3):25–32
Article Google Scholar
S.I.A. Int. Technology Roadmap for Semiconductors (2005) http://public.itrs.net/Links/2005ITRS/Home2005.htm
Kim NS et al (2003) Leakage current: Moore’s law meets static power. Computer 36(12):68–75
Article Google Scholar
Foglia P, Monni G, Prete CA, Solinas M (2010) Re-nuca: boosting CMP performances through block replication. In: Proc 13th EUROMICRO conference on digital system design, architectures, methods and tools, Lille, France, 1–3 September 2010, pp 199–206
Google Scholar
Foglia P, Solinas M (2013) Exploiting replication to improve performances of NUCA-based CMP systems. ACM Trans Embed Comput Syst. Accepted September 2013, to appear
Qureshi MK, Patt YN (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proc of the 39th annual IEEE/ACM int symp on microarchitecture (MICRO 39)
Google Scholar
Xie Y, Loh GH (2010) Scalable shared cache management by containing thrashing workloads. In: Proc of the int conf on high-performance embedded architectures and compilers (HiPEAC), Pisa, Italy, 25–27 January 2010, pp 262–276
Chapter Google Scholar
Kahng A, Li B, Peh L-S, Samadi K (2009) ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration. In: Proc of design automation and test in Europe (DATE), Nice, France, April 2009
Google Scholar
Agarwal V, Hrishikesh MS, Keckler S, Burger D (2000) Clock rate versus IPC: the end of the road for conventional microarchitectures. In: Proc of 27th ISCA, June 2000
Google Scholar
Ho R, Mai KW, Horowitz MA (2001) The future of wires. Proc IEEE 89(4):490–504
Article Google Scholar
Mattson RL, Gecsei J, Slutz D, Traiger I (1970) Evaluation techniques for storage hierarchies. IBM Syst J. doi:10.1147/sj.92.0078
Google Scholar
Cascaval C, DeRose L, Padua DA, Reed D (1999) Compile-time based performance prediction. In: 12th intl workshop on languages and compilers for parallel computing
Google Scholar
Kotera I, Abe K, Egawa R, Takizawa H, Kobayashi H (2008) Power-aware dynamic cache partitionning for cmps. Trans HiPEAC 3(2):149–167
Google Scholar
Tanenbaum AS (2007) Modern operating systems, 3rd edn. Prentice Hall Press, Englewood Cliffs
Google Scholar
Fallin C, Nazario G, Yuy X, Chang K, Ausavarungnirun R, Mutlu O (2012) MinBD: minimally-buffered deflection routing for energy-efficient interconnect. In: NOCS
Google Scholar
Lotfi-Kamran P, Grot B, Falsafi B (2012) NOC-out: microarchitecting a scale-out processor. In: Proc the 45th annual inter symp on microarchitecture, Vancouver, Canada, December 2012
Google Scholar
Homayoun H, Sasan A, Veidenbaum AV, Yao H-C, Golshan S, Heydari P (2011) MZZ-HVS: multiple sleep modes zig-zag horizontal and vertical sleep transistor sharing to reduce leakage power in on-chip SRAM peripheral circuits. IEEE Trans Very Large Scale Integr (VLSI) Syst 19(12):2303–2316
Article Google Scholar
Chandra D, Guo F, Kim S, Solihin Y (2005) Predicting inter-thread cache contention on a chip multi-processor architecture. In: HPCA ’05: proceedings of the 11th international symposium on high-performance computer architecture, pp 340–351
Google Scholar
Meng Y, Sherwood T, Kastner R (2005) Exploring the limits of leakage power reduction in caches. ACM Trans Archit Code Optim 2(3):221–246
Article Google Scholar
Zhao W, Cao Y (2006) New generation of predictive technology model for sub-45 nm design exploration. In: Proc 7th int symp quality electron design, Mar 2006, pp 590–596
Google Scholar
Keating M, Flynn D, Aitken R, Gibbons A, Shi K (2007) Low power methodology manual. Springer, Berlin
Google Scholar
Comparetti M, Foglia P et al (2009) A power-efficient migration mechanism for D-NUCA caches. In: Design, automation & test in Europe 2009 (Date 2009), Nice, France, 20–24 April 2009, pp 598–601
Google Scholar
Bardine A, Foglia P, Panicucci F, Sahuquillo J, Solinas M (2011) Energy behaviour of NUCA caches in CMPs. In: 14th EUROMICRO conference on digital system design, architectures, methods and tools (DSD2011), OULU, Finland, 31 August–2 September 2011, pp 746–753
Chapter Google Scholar
Hardavellas N et al (2009) Reactive NUCA: near-optimal block placement and replication in distributed caches. In: 36th annual international symposium on computer architecture (ISCA ’09). ACM, New York, pp 184–195. doi:10.1145/1555815.1555779
Google Scholar
Bartolini S et al (2010) Feedback driven restructuring of multi-threaded applications for NUCA cache performance in CMPs. In: 22nd int symp on computer architecture and high performance computing, Petropolis, Brazil, 27–30 October 2010, pp 87–94. doi:10.1109/SBAC-PAD.2010.20
Google Scholar
Bardine A, Comparetti M, Foglia P, Gabrielli G, Prete CA, Stenstrom P (2008) Leveraging data promotion for low power D-NUCA caches. In: 11th EUROMICRO conference on digital system design, Parma, Italy, 3–5 September 2008, pp 307–316. doi:10.1109/DSD.2008.52
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria dell’Informazione, Università di Pisa, Via Diotisalvi 2, 56126, Pisa, Italy
Pierfrancesco Foglia
R&D, ION Trading, Via S. Martino 54, 56125, Pisa, Italy
Manuel Comparetti

Authors

Pierfrancesco Foglia
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Comparetti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierfrancesco Foglia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Foglia, P., Comparetti, M. A workload independent energy reduction strategy for D-NUCA caches. J Supercomput 68, 157–182 (2014). https://doi.org/10.1007/s11227-013-1033-5

Download citation

Published: 24 October 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s11227-013-1033-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A workload independent energy reduction strategy for D-NUCA caches

Abstract

Access this article

Similar content being viewed by others

DEAM: Decoupled, Expressive, Area-Efficient Metadata Cache

A Data Management Policy for Energy-Efficient Cache Mechanisms

PS-Cache: an energy-efficient cache design for chip multiprocessors

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A workload independent energy reduction strategy for D-NUCA caches

Abstract

Access this article

Similar content being viewed by others

DEAM: Decoupled, Expressive, Area-Efficient Metadata Cache

A Data Management Policy for Energy-Efficient Cache Mechanisms

PS-Cache: an energy-efficient cache design for chip multiprocessors

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation