Skip to main content

Advertisement

Log in

An energy-efficient 3D-stacked STT-RAM cache architecture for cloud processors: the effect on emerging scale-out workloads

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper focuses on energy consumption which is a major problem in the dark silicon era. As energy consumption becomes a key issue for operation and maintenance of cloud data centers, cloud computing providers are becoming significantly concerned. Here, we show how spin-transfer torque random access memory (STT-RAM) can be used as an on-chip L2 cache to obtain lower energy compared to conventional L2 caches, like SRAM. High density, fast read access and non-volatility make STT-RAM a significant technology for on-chip memories. Previous studies have mainly studied specific schemes based on common applications and do not provide a thorough analysis of emerging scale-out applications with multiple design options. Here, we discuss different outlooks consisting of performance and energy efficiency in cloud processors by running emerging scale-out workloads. Experiment results on the CloudSuite benchmarks show that the proposed method reduces energy by 51% (on average) and improves energy delay product by 37% (on average) where instruction per cycle degradation is only 22% (on average) compared to the SRAM method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl 1(1):7–18

    Article  Google Scholar 

  2. Awada U, Li K, Shen Y (2014) Energy consumption in cloud computing data centers. Int J Cloud Comput Serv Sci 3(3):145–162. https://search.proquest.com/openview/ba8d06da1291e9a4326e00c63654707f/1?pq-origsite=gscholar&cbl=1686342

  3. Rong H, Zhang H, Xiao S, Li C, Chunhua H (2016) Optimizing energy consumption for data centers. Renew Sustain Energy Rev 58:674–691

    Article  Google Scholar 

  4. Toosi AN, Calheiros RN, Buyya R (2014) Interconnected cloud computing environments: challenges, taxonomy, and survey. ACM Comput Surv (CSUR) 47(1):7

    Article  Google Scholar 

  5. Mihailescu M, Teo YM (2010) Dynamic resource pricing on federated clouds. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE Computer Society, pp 513–517

  6. Ferdman M, Adileh A, Kocberber O et al (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)

  7. Lotfi-Kamran P, Grot B, Falsafi B (2012) NOC-out: microarchitecting a scale-out processor. In Proceedings of the 2012 45th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, pp 177–187

  8. Johnson P, Marker T (2009) Data center energy efficiency product profile. In: Pitt & Sherry, report to equipment energy efficiency committee (E3) of the Australian Government Department of the Environment, Water, Heritage and the Arts (DEWHA)

  9. Rong H, Zhang H, Xiao S, Li C, Hu C (2016) Optimizing energy consumption for data centers. Renew Sustain Energy Rev 58:674–691

    Article  Google Scholar 

  10. Wang Q, Shen L, Wang Z (2013) Research on scale-out workloads and optimal design of multicore processors. In: Proceedings of International Conference on Soft Computing Techniques and Engineering Application

  11. Apalkov D, Khvalkovskiy A, Watts S, Nikitin V, Tang X, Lottis D, Driskill-Smith A (2013) Spin-transfer torque magnetic random access memory (STT-MRAM). ACM J Emerg Technol Comput Syst 9(2):13

    Article  Google Scholar 

  12. Jokar MR, Arjomand M, Sarbazi-Azad H (2016) Sequoia: a high-endurance NVM-based cache architecture. IEEE Trans Very Large Scale Integr VLSI Syst 24(3):954–967

    Article  Google Scholar 

  13. Lotfi-Kamran P, Modarressi M, Sarbazi-Azad H (2016) An efficient hybrid-switched network-on-chip for chip multiprocessors. IEEE Trans Comput 65(5):1656–1662

    Article  MathSciNet  MATH  Google Scholar 

  14. Karakostas V, Unsal OS, Nemirovsky M, Cristal A, Swift M (2014) Performance analysis of the memory management unit under scale-out workloads. In: 2014 IEEE international symposium on Workload characterization (IISWC). IEEE, pp 1–12

  15. Jevdjic D, Loh GH, Kaynak C, Falsafi B (2014) Unison cache: a scalable and effective die-stacked DRAM cache. In: 2014 47th annual IEEE/ACM international symposium on microarchitecture. IEEE, pp 25–37

  16. Jevdjic D, Volos S, Falsafi B (2013) Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache. ACM SIGARCH Comput Archit News 41(3):404–415

    Article  Google Scholar 

  17. Wang Z, Jiménez DA, Xu C, Sun G, Xie Y (2014) Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In: High performance computer architecture (HPCA), pp 13–24

  18. Chen Y-T, Cong J, Huang H, Liu B, Liu C, Potkonjak M, Reinman G (2012) Dynamically reconfigurable hybrid cache: an energy efficient last-level cache design. In: DATE’12, pp 45–50

  19. Ahn J, Yoo S, Choi K (2015) Prediction hybrid cache: an energy-efficient STT-RAM cache architecture. IEEE Trans Comput 65(3):940–951

    Article  MathSciNet  MATH  Google Scholar 

  20. Valero A, Sahuquillo J, Petit S, Lopez P, Duato J (2015) Design of hybrid second-level caches. IEEE Trans Comput 64(7):1884–1897

    Article  MathSciNet  MATH  Google Scholar 

  21. Zhou Z, Ju L, Jia Z, Li X (2015) Managing hybrid on-chip scratchpad and cache memories for multi-tasking embedded systems. In: 20th Asia and South Pacific Design Automation Conference (ASP-DAC’15), pp 423–428

  22. Qian C, Huang L, Xie P, Xiao N, Wang Z (2015) A study on non-volatile 3d stacked memory for big data applications. In: International Conference on Algorithms and Architectures for Parallel Processing. Springer, pp 103–118

  23. Onsori S, Asad A, Raahemifar K, Fathy M (2016) An energy-efficient heterogeneous memory architecture for future dark silicon embedded chip-multiprocessors. IEEE Trans Emerg Top Comput. https://doi.org/10.1109/TETC.2016.2563323

  24. Asad A, Ozturk O, Fathy M, Jahed-Motlagh MR (2017) Optimization-based power and thermal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache hierarchy. Microprocess Microsyst 51:76–98

    Article  Google Scholar 

  25. Onsori S, Asad A, Raahemifar K, Fathy M (2016) OptMem: dark-silicon aware low latency hybrid memory design. In: 2016 International Conference on VLSI Systems, Architectures, Technology and Applications (VLSI-SATA). IEEE, pp 1–5

  26. Onsori S, Asad A, Ozturk O, Fathy M (2015) Hybrid stacked memory architecture for energy efficient embedded chip-multiprocessors based on compiler directed approach. In: 2015 Sixth International on Green Computing Conference and Sustainable Computing Conference (IGSC). IEEE, pp 1–7

  27. Senni S, Torres L, Sassatelli G, Gamatie A, Mussard B (2016) Exploring MRAM technologies for energy efficient systems-on-chip. IEEE J Emerg Sel Top Circuits Syst 6(3):279–292

    Article  Google Scholar 

  28. Gordon A, Amit N, Har’El N, Ben-Yehuda M, Landau A, Assaf S, Tsafrir D (2012) ELI: bare-metal performance for I/O virtualization. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, London, pp 411–422

  29. Li D, Liao X, Jin H, Zhou B, Zhang Q (2013) A new disk I/O model of virtualized cloud environment. IEEE Trans Parallel Distrib Syst 24(6):1129–1138

    Article  Google Scholar 

  30. Duolikun D, Enokido T, Takizawa M (2017) An energy-aware algorithm to migrate virtual machines in a server cluster. Int J Space Based Situat Comput 7(1):32–42

    Article  Google Scholar 

  31. Xilong Q, Peng X (2015) An energy-efficient virtual machine scheduler based on cpu share-reclaiming policy. Int J Grid Util Comput 6(2):113–120

    Article  Google Scholar 

  32. Wang J, Zhang J, Zhang W, Qiu K, Li T, Wu M (2015) Near threshold cloud processors for dark silicon mitigation: the impact on emerging scale-out workloads. In: Proceedings of the 12th ACM International Conference on Computing Frontiers. ACM, p 4

  33. Pahlevan A, Picorel J, Zarandi AP, Rossi D, Zapater M, Bartolini A et al (2016) Towards near-threshold server processors. In: 2016 Design, Automation and Test in Europe Conference and Exhibition (DATE). IEEE, pp 7–12

  34. Hosomi M, Yamagishi H, Yamamoto T, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C et al (2005)A novel non-volatile memory with spin torque transfer magnetization switching: spin-ram. In: IEEE international electron devices meeting, 2005. IEDM technical digest. IEEE, pp 459–462

  35. Niknam S, Asad A, Fathy M, Rahmani AM (2015) Energy efficient 3D Hybrid processor-memory architecture for the dark silicon age. In: 2015 10th International symposium on reconfigurable communication-centric systems-on-chip (ReCoSoC). IEEE, pp 1–8

  36. Loh GH (2008) 3D-stacked memory architectures for multi-core processors. ACM SIGARCH Comput Archit News 36(3):453–464

    Article  Google Scholar 

  37. Wenisch T, Wunderlich R, Ferdman M, Ailamaki A, Falsafi B, Hoe J (2006) SimFlex: statistical sampling of computer system simulation. IEEE Micro 26(4):18–31

    Article  Google Scholar 

  38. Palesi M, Kumar S, Patti D (2010) Noxim: network-on-chip simulator. http://noxim.sourceforge.net. Accessed 28 Feb 2017

  39. Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Annual IEEE/ACM international symposium on micro-42, pp 469–480

  40. Muralimanohar N, Balasubramonian R, Jouppi NP (2009) CACTI 6.0: a tool to model large caches. HP Laboratories, technical report

  41. Dong X, Xu C, Jouppi N, and Xie Y (2014) NVSim: a circuit-level performance, energy, and area model for emerging non-volatile memory. In: Xie Y (ed) Emerging memory technologies. Springer, New York, pp 15–50. http://www.springer.com/gp/book/9781441995506?wt_mc=ThirdParty.SpringerLink.3.EPR653.About_eBook

  42. CloudSuite 1.0 (2012) [Online]. http://parsa.epfl.ch/cloudsuite. Accessed 10 Mar 2017

  43. Vazquez C, Krishnan R, John E (2014) Cloud computing benchmarking: a survey. In: Proceedings of the International Conference on Grid Computing and Applications (GCA)

  44. Breternitz M, Lowery K, Charnoff A, Kaminski P, Piga L (2012) Cloud workload analysis with SWAT. In: 2012 IEEE 24th international symposium on computer architecture and high performance computing (SBAC-PAD). IEEE, pp 92–99

  45. Chen E, Lottis D, Driskill-Smith A, Druist D, Nikitin V, Watts S, Tang X, Apalkov D (2010) Non-volatile spin-transfer torque RAM (STT-RAM). In: Device Research Conference (DRC). IEEE, pp 249–252

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahmood Fathy.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nasri, A., Fathy, M. & Broumandnia, A. An energy-efficient 3D-stacked STT-RAM cache architecture for cloud processors: the effect on emerging scale-out workloads. J Supercomput 74, 1547–1561 (2018). https://doi.org/10.1007/s11227-017-2180-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2180-x

Keywords

Navigation