Abstract
This paper focuses on energy consumption which is a major problem in the dark silicon era. As energy consumption becomes a key issue for operation and maintenance of cloud data centers, cloud computing providers are becoming significantly concerned. Here, we show how spin-transfer torque random access memory (STT-RAM) can be used as an on-chip L2 cache to obtain lower energy compared to conventional L2 caches, like SRAM. High density, fast read access and non-volatility make STT-RAM a significant technology for on-chip memories. Previous studies have mainly studied specific schemes based on common applications and do not provide a thorough analysis of emerging scale-out applications with multiple design options. Here, we discuss different outlooks consisting of performance and energy efficiency in cloud processors by running emerging scale-out workloads. Experiment results on the CloudSuite benchmarks show that the proposed method reduces energy by 51% (on average) and improves energy delay product by 37% (on average) where instruction per cycle degradation is only 22% (on average) compared to the SRAM method.
Similar content being viewed by others
References
Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl 1(1):7–18
Awada U, Li K, Shen Y (2014) Energy consumption in cloud computing data centers. Int J Cloud Comput Serv Sci 3(3):145–162. https://search.proquest.com/openview/ba8d06da1291e9a4326e00c63654707f/1?pq-origsite=gscholar&cbl=1686342
Rong H, Zhang H, Xiao S, Li C, Chunhua H (2016) Optimizing energy consumption for data centers. Renew Sustain Energy Rev 58:674–691
Toosi AN, Calheiros RN, Buyya R (2014) Interconnected cloud computing environments: challenges, taxonomy, and survey. ACM Comput Surv (CSUR) 47(1):7
Mihailescu M, Teo YM (2010) Dynamic resource pricing on federated clouds. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE Computer Society, pp 513–517
Ferdman M, Adileh A, Kocberber O et al (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Lotfi-Kamran P, Grot B, Falsafi B (2012) NOC-out: microarchitecting a scale-out processor. In Proceedings of the 2012 45th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, pp 177–187
Johnson P, Marker T (2009) Data center energy efficiency product profile. In: Pitt & Sherry, report to equipment energy efficiency committee (E3) of the Australian Government Department of the Environment, Water, Heritage and the Arts (DEWHA)
Rong H, Zhang H, Xiao S, Li C, Hu C (2016) Optimizing energy consumption for data centers. Renew Sustain Energy Rev 58:674–691
Wang Q, Shen L, Wang Z (2013) Research on scale-out workloads and optimal design of multicore processors. In: Proceedings of International Conference on Soft Computing Techniques and Engineering Application
Apalkov D, Khvalkovskiy A, Watts S, Nikitin V, Tang X, Lottis D, Driskill-Smith A (2013) Spin-transfer torque magnetic random access memory (STT-MRAM). ACM J Emerg Technol Comput Syst 9(2):13
Jokar MR, Arjomand M, Sarbazi-Azad H (2016) Sequoia: a high-endurance NVM-based cache architecture. IEEE Trans Very Large Scale Integr VLSI Syst 24(3):954–967
Lotfi-Kamran P, Modarressi M, Sarbazi-Azad H (2016) An efficient hybrid-switched network-on-chip for chip multiprocessors. IEEE Trans Comput 65(5):1656–1662
Karakostas V, Unsal OS, Nemirovsky M, Cristal A, Swift M (2014) Performance analysis of the memory management unit under scale-out workloads. In: 2014 IEEE international symposium on Workload characterization (IISWC). IEEE, pp 1–12
Jevdjic D, Loh GH, Kaynak C, Falsafi B (2014) Unison cache: a scalable and effective die-stacked DRAM cache. In: 2014 47th annual IEEE/ACM international symposium on microarchitecture. IEEE, pp 25–37
Jevdjic D, Volos S, Falsafi B (2013) Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache. ACM SIGARCH Comput Archit News 41(3):404–415
Wang Z, Jiménez DA, Xu C, Sun G, Xie Y (2014) Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In: High performance computer architecture (HPCA), pp 13–24
Chen Y-T, Cong J, Huang H, Liu B, Liu C, Potkonjak M, Reinman G (2012) Dynamically reconfigurable hybrid cache: an energy efficient last-level cache design. In: DATE’12, pp 45–50
Ahn J, Yoo S, Choi K (2015) Prediction hybrid cache: an energy-efficient STT-RAM cache architecture. IEEE Trans Comput 65(3):940–951
Valero A, Sahuquillo J, Petit S, Lopez P, Duato J (2015) Design of hybrid second-level caches. IEEE Trans Comput 64(7):1884–1897
Zhou Z, Ju L, Jia Z, Li X (2015) Managing hybrid on-chip scratchpad and cache memories for multi-tasking embedded systems. In: 20th Asia and South Pacific Design Automation Conference (ASP-DAC’15), pp 423–428
Qian C, Huang L, Xie P, Xiao N, Wang Z (2015) A study on non-volatile 3d stacked memory for big data applications. In: International Conference on Algorithms and Architectures for Parallel Processing. Springer, pp 103–118
Onsori S, Asad A, Raahemifar K, Fathy M (2016) An energy-efficient heterogeneous memory architecture for future dark silicon embedded chip-multiprocessors. IEEE Trans Emerg Top Comput. https://doi.org/10.1109/TETC.2016.2563323
Asad A, Ozturk O, Fathy M, Jahed-Motlagh MR (2017) Optimization-based power and thermal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache hierarchy. Microprocess Microsyst 51:76–98
Onsori S, Asad A, Raahemifar K, Fathy M (2016) OptMem: dark-silicon aware low latency hybrid memory design. In: 2016 International Conference on VLSI Systems, Architectures, Technology and Applications (VLSI-SATA). IEEE, pp 1–5
Onsori S, Asad A, Ozturk O, Fathy M (2015) Hybrid stacked memory architecture for energy efficient embedded chip-multiprocessors based on compiler directed approach. In: 2015 Sixth International on Green Computing Conference and Sustainable Computing Conference (IGSC). IEEE, pp 1–7
Senni S, Torres L, Sassatelli G, Gamatie A, Mussard B (2016) Exploring MRAM technologies for energy efficient systems-on-chip. IEEE J Emerg Sel Top Circuits Syst 6(3):279–292
Gordon A, Amit N, Har’El N, Ben-Yehuda M, Landau A, Assaf S, Tsafrir D (2012) ELI: bare-metal performance for I/O virtualization. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, London, pp 411–422
Li D, Liao X, Jin H, Zhou B, Zhang Q (2013) A new disk I/O model of virtualized cloud environment. IEEE Trans Parallel Distrib Syst 24(6):1129–1138
Duolikun D, Enokido T, Takizawa M (2017) An energy-aware algorithm to migrate virtual machines in a server cluster. Int J Space Based Situat Comput 7(1):32–42
Xilong Q, Peng X (2015) An energy-efficient virtual machine scheduler based on cpu share-reclaiming policy. Int J Grid Util Comput 6(2):113–120
Wang J, Zhang J, Zhang W, Qiu K, Li T, Wu M (2015) Near threshold cloud processors for dark silicon mitigation: the impact on emerging scale-out workloads. In: Proceedings of the 12th ACM International Conference on Computing Frontiers. ACM, p 4
Pahlevan A, Picorel J, Zarandi AP, Rossi D, Zapater M, Bartolini A et al (2016) Towards near-threshold server processors. In: 2016 Design, Automation and Test in Europe Conference and Exhibition (DATE). IEEE, pp 7–12
Hosomi M, Yamagishi H, Yamamoto T, Bessho K, Higo Y, Yamane K, Yamada H, Shoji M, Hachino H, Fukumoto C et al (2005)A novel non-volatile memory with spin torque transfer magnetization switching: spin-ram. In: IEEE international electron devices meeting, 2005. IEDM technical digest. IEEE, pp 459–462
Niknam S, Asad A, Fathy M, Rahmani AM (2015) Energy efficient 3D Hybrid processor-memory architecture for the dark silicon age. In: 2015 10th International symposium on reconfigurable communication-centric systems-on-chip (ReCoSoC). IEEE, pp 1–8
Loh GH (2008) 3D-stacked memory architectures for multi-core processors. ACM SIGARCH Comput Archit News 36(3):453–464
Wenisch T, Wunderlich R, Ferdman M, Ailamaki A, Falsafi B, Hoe J (2006) SimFlex: statistical sampling of computer system simulation. IEEE Micro 26(4):18–31
Palesi M, Kumar S, Patti D (2010) Noxim: network-on-chip simulator. http://noxim.sourceforge.net. Accessed 28 Feb 2017
Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Annual IEEE/ACM international symposium on micro-42, pp 469–480
Muralimanohar N, Balasubramonian R, Jouppi NP (2009) CACTI 6.0: a tool to model large caches. HP Laboratories, technical report
Dong X, Xu C, Jouppi N, and Xie Y (2014) NVSim: a circuit-level performance, energy, and area model for emerging non-volatile memory. In: Xie Y (ed) Emerging memory technologies. Springer, New York, pp 15–50. http://www.springer.com/gp/book/9781441995506?wt_mc=ThirdParty.SpringerLink.3.EPR653.About_eBook
CloudSuite 1.0 (2012) [Online]. http://parsa.epfl.ch/cloudsuite. Accessed 10 Mar 2017
Vazquez C, Krishnan R, John E (2014) Cloud computing benchmarking: a survey. In: Proceedings of the International Conference on Grid Computing and Applications (GCA)
Breternitz M, Lowery K, Charnoff A, Kaminski P, Piga L (2012) Cloud workload analysis with SWAT. In: 2012 IEEE 24th international symposium on computer architecture and high performance computing (SBAC-PAD). IEEE, pp 92–99
Chen E, Lottis D, Driskill-Smith A, Druist D, Nikitin V, Watts S, Tang X, Apalkov D (2010) Non-volatile spin-transfer torque RAM (STT-RAM). In: Device Research Conference (DRC). IEEE, pp 249–252
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nasri, A., Fathy, M. & Broumandnia, A. An energy-efficient 3D-stacked STT-RAM cache architecture for cloud processors: the effect on emerging scale-out workloads. J Supercomput 74, 1547–1561 (2018). https://doi.org/10.1007/s11227-017-2180-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2180-x