Skip to main content

Advertisement

Log in

PS-Cache: an energy-efficient cache design for chip multiprocessors

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Power consumption has become a major design concern in current high-performance chip multiprocessors, and this problem exacerbates with the number of core counts. A significant fraction of the total power budget is often consumed by on-chip caches, thus important research has focused on reducing energy consumption in these structures. To enhance performance, on-chip caches are being deployed with a high associativity degree. Consequently, accessing concurrently all the ways in the cache set is costly in terms of energy. This paper presents the PS-Cache architecture, an energy-efficient cache design that reduces the number of accessed ways without hurting the performance. The PS-Cache takes advantage of the private-shared knowledge of the referenced block to reduce energy by accessing only those ways holding the kind of block looked up. Experimental results show that, on average, the PS-Cache architecture can reduce the dynamic energy consumption of L1 and L2 caches by \(22\) and \(40\,\%\), respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Experimental environment, system parameters, protocols and cache hierarchy are described in Section 5.

  2. If a TLB miss occurs, after solving the miss the corresponding entry is in the TLB.

References

  1. Balasubramonian R, Jouppi NP, Muralimanohar N (2011) Multi-core cache hierarchies. In: Synthesis lectures on computer architecture. Morgan & Claypool Publishers, San Rafael

  2. Hennessy JL, Patterson DA (2011) Computer architecture, fifth edition: a quantitative approach, 5th edn. Morgan Kaufmann Publishers Inc., San Francisco

  3. Sinharoy B, Kalla R, Starke WJ, Le HQ, Cargnoni R, Van Norstrand JA, Ronchetti BJ, Stuecheli J, Leenstra J, Guthrie GL, Nguyen DQ, Blaner B, Marino CF, Retter E, Williams P (2011) IBM POWER7 multicore server processor. IBM J Res Dev 5(3):1:1-1:29 doi:10.1147/JRD.2011.2127330

  4. Kaxiras S, Hu Z, Martonosi M (2011) 28th International symposium on computer architecture (ISCA), pp 240–251

  5. Flautner K, Kim NS, Martin S, Blaauw D, Kaxiras TM, Hu Z, Martonosi M (2002) 29th International symposium on computer architecture (ISCA), pp 148–157

  6. Ghosh M, Özer E, Ford S, Biles S, Lee HHS (2009) International symposium on low power electronics and design (ISLPED), pp 165–170

  7. Calder B, Grunwald D (1996) 2nd international symposium on high-performance computer architecture (HPCA) (1996), pp 244–253

  8. Hardavellas N, Ferdman M, Falsafi B, Ailamaki A (2009) 36th international symposium on computer architecture (ISCA), pp 184–195

  9. Cuesta B, Ros A, Gómez ME, Robles A, Duato J (2011) 38th international symposium on computer architecture (ISCA), pp 93–103

  10. Pugsley SH, Spjut JB, Nellans DW, Balasubramonian R (2010) 19th international conference on parallel architectures and compilation techniques (PACT), pp 465–476

  11. Hossain H, Dwarkadas S, Huang MC (2011) 20th international conference on parallel architectures and compilation techniques (PACT), pp 45–55

  12. Kim D, Kim JAJ, Huh J (2010) 19th international conference on parallel architectures and compilation techniques (PACT), pp 111–122

  13. Ros A, Kaxiras S (2012) 21st international conference on parallel architectures and compilation techniques (PACT), pp 241–252

  14. Sundararajan KT, Porpodas V, Jones TM, Topham NP, Franke B (2012) 18th international symposium on high-performance computer architecture (HPCA), pp 311–322

  15. Agarwal N, Peh LS, Jha NK (2009) 15th international symposium on high-performance computer architecture (HPCA), pp 67–78

  16. Cantin JF, Smith JE, Lipasti MH, Moshovos A, Falsafi B (2006) Coarse-grain coherence tracking: región scout and region coherence arrays. IEEE Micro 26(1):70–95

  17. Ferdman M, Lotfi-Kamran P, Balet K, Falsafi B (2011) 17th international symposium on high-performance computer architecture (HPCA), pp 169–180

  18. Zebchuk J, Srinivasan V, Qureshi MK, Moshovos A (2009) 42nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 423–434

  19. Powell M, Hyun Yang S, Falsafi B, Roy K, Vijaykumar TN (2000) International symposium on low power electronics and design (ISLPED), pp 90–95

  20. Albonesi DH (1999) 32nd IEEE/ACM international symposium on microarchitecture (MICRO), pp 248–259

  21. Zhang C, Vahid F, Yang J, Najjar W (2005) A way-halting cache for low-energy high-performance systems. ACM Transactions on Architecture and Code Optimization. 2(1):34–54

  22. Ghosh M, Özer E, Biles S, Lee HHS (2006) 19th international conference on architecture of computing systems (ARCS), pp 283–297

  23. Lee J, Hong S, Kim S (2011) 17th international symposium on low power electronics and design (ISLPED), pp 85–90

  24. Kedzierski K, Cazorla FJ, Gioiosa R, Buyuktosunoglu A, Valero M (2010) 2nd international forum on next-generation multicore/manycore technologies, pp 1–12

  25. Alouani I, Niar S, Kurdahi F, Abid M (2012) 23rd IEEE international symposium on rapid system prototyping (RSP), pp 44–48

  26. Meng J, Skadron K (2009) International conference on computer design (ICCD), pp 282–288

  27. Li Y, Abousamra A, Melhem R, Jones AK (2010) 19th international conference on parallel architectures and compilation techniques (PACT), pp 501–512

  28. Li Y, Melhem RG, Jones AK (2012) 21st international conference on parallel architectures and compilation techniques (PACT), pp 231–240

  29. Alisafaee M (2012) 45th IEEE/ACM international symposium on microarchitecture (MICRO), pp 341–350

  30. Jiang G, Fen D, Tong L, Xiang L, Wang C, Chen T (2009) 8th international symposium on advanced parallel processing technologies. Springer, Berlin, pp 123–133

  31. Sundararajan K, Jones T, Topham N (2013) IEEE 31st international conference on computer design (ICCD), pp 294–301

  32. Valls JJ, Ros A, Sahuquillo J, Gómez ME, Duato J (2012) 21st international conference on parallel architectures and compilation techniques (PACT), pp 451–452

  33. Ros A, Cuesta B, Gómez ME, Robles A, Duato J (2013) 42nd international conference on parallel processing (ICPP), pp 562–571

  34. Jacob B, Ng S, Wang D (2007) Memory systems: cache, DRAM, disk, 4th edn. Morgan Kaufmann Publishers Inc., San Francisco

  35. Patterson DA, Hennessy JL (2008) Computer organization and design: the hardware/software interface. The Morgan Kaufmann Series in Computer Architecture and Design, 4th edn. Morgan Kaufmann Publishers Inc., San Francisco

  36. Magnusson PS, Christensson M, Eskilson J, Forsgren D, Hallberg G, Hogberg J, Larsson F, Moestedt A, Werner B (2002) Simics: a full system simulation platform. IEEE Comput 35(2):50–58

  37. Martin MM, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator GEMS toolset. Comput Archit News 33(4):92–99

  38. Agarwal N, Krishna T, Peh LS, Jha NK (2009) IEEE international symposium on performance analysis of systems and software (ISPASS), pp 33–42

  39. Muralimanohar N, Balasubramonian R, Jouppi NP (2009) Cacti 6.0. Tech. Rep. HPL-2009-85, HP Labs

  40. Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) 22nd international symposium on computer architecture (ISCA), pp 24–36

  41. Li ML, Sasanka R, Adve SV, Chen YK, Debes E (2005) International symposium on workload characterization, pp 34–45

  42. Bienia C, Kumar S, Singh JP, Li K (2008) 17th international conference on parallel architectures and compilation techniques (PACT), pp 72–81

Download references

Acknowledgments

This work has been jointly supported by the MINECO and European Commission (FEDER funds) under the project TIN2012-38341-C04-01 and the Fundaci’on Seneca-Agencia de Ciencia y Tecnolo’ia de la Regi’on de Murcia under the project J’ovenes L’ideres en Investigaci’on 18956/JLI/13.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joan J. Valls.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Valls, J.J., Ros, A., Sahuquillo, J. et al. PS-Cache: an energy-efficient cache design for chip multiprocessors. J Supercomput 71, 67–86 (2015). https://doi.org/10.1007/s11227-014-1288-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1288-5

Keywords

Navigation