ABSTRACT
Cryogenic computing, which runs a computer device at an extremely low temperature, is promising thanks to its significant reduction of wire resistance as well as leakage current. Recent studies on cryogenic computing have focused on various architectural units including the main memory, cache, and CPU core running at 77K. However, little research has been conducted to fully exploit the fast cryogenic wires, even though the slow wires are becoming more serious performance bottleneck in modern processors. In this paper, we propose a CPU microarchitecture which extensively exploits the fast wires at 77K. For this goal, we first introduce our validated cryogenic-performance models for the CPU pipeline and network on chip (NoC), whose performance can be significantly limited by the slow wires. Next, based on the analysis with the models, we architect CryoSP and CryoBus as our pipeline and NoC designs to fully exploit the fast wires. Our evaluation shows that our cryogenic computer equipped with both microarchitectures achieves 3.82 times higher system-level performance compared to the conventional computer system thanks to the 96% higher clock frequency of CryoSP and five times lower NoC latency of CryoBus.
- Niket Agarwal, Tushar Krishna, Li-Shiuan Peh, and Niraj K Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In 2009 IEEE international symposium on performance analysis of systems and software. 33–42. https://doi.org/10.1109/ISPASS.2009.4919636 Google ScholarCross Ref
- Yuta Aiba, Hitomi Tanaka, Takashi Maeda, Keiichi Sawa, Fumie Kikushima, Masayuki Miura, Toshio Fujisawa, Mie Matsuo, Hideto Horii, and Hideko Mukaida. 2021. Bringing in Cryogenics to Storage: Characteristics and Performance Improvement of 3D Flash Memory. In 2021 IEEE International Memory Workshop (IMW). 1–4. https://doi.org/10.1109/IMW51353.2021.9439594 Google ScholarCross Ref
- Shamiul Alam, Md Mazharul Islam, Nazmul Amin, Md Shafayat Hossain, Akhilesh Jaiswal, and Ahmedullah Aziz. 2021. CryoCiM: Cryogenic Compute-in-Memory based on the Quantum Anomalous Hall Effect. arXiv preprint arXiv:2112.00124.Google Scholar
- Fawaz Alazemi, Arash Azizimazreah, Bella Bose, and Lizhong Chen. 2018. Routerless network-on-chip. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 492–503. https://doi.org/10.1109/HPCA.2018.00049 Google ScholarCross Ref
- Krste Asanovic, David A Patterson, and Christopher Celio. 2015. The berkeley out-of-order machine (boom): An industry-competitive, synthesizable, parameterized risc-v processor. University of California at Berkeley Berkeley United States.Google Scholar
- Chris Auth, A Aliyarukunju, M Asoro, D Bergstrom, V Bhagwat, J Birdsall, N Bisnik, M Buehler, V Chikarmane, and G Ding. 2017. A 10nm high performance and low-power CMOS technology featuring 3 rd generation FinFET transistors, Self-Aligned Quad Patterning, contact over active gate and cobalt local interconnects. In 2017 IEEE International Electron Devices Meeting (IEDM). 29–1. https://doi.org/10.1109/IEDM.2017.8268472 Google ScholarCross Ref
- Rajeev Balasubramonian, Sandhya Dwarkadas, and David H Albonesi. 2003. Dynamically managing the communication-parallelism trade-off in future clustered processors. In 30th Annual International Symposium on Computer Architecture, 2003. Proceedings.. 275–286. https://doi.org/10.1109/ISCA.2003.1207007 Google ScholarCross Ref
- James Balfour and William J Dally. 2006. Design tradeoffs for tiled CMP on-chip networks. In ACM International conference on supercomputing 25th anniversary volume. 390–401. https://doi.org/10.1145/2591635.2667187 Google ScholarDigital Library
- Kaustav Banerjee, Shukri J Souri, Pawan Kapur, and Krishna C Saraswat. 2001. 3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration. Proc. IEEE, 89, 5 (2001), 602–633. https://doi.org/10.1109/5.929647 Google ScholarCross Ref
- Daniel U Becker. 2012. Efficient microarchitecture for network-on-chip routers. Stanford University.Google Scholar
- Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques. 72–81. https://doi.org/10.1145/1454115.1454128 Google ScholarDigital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, and Somayeh Sardashti. 2011. The gem5 simulator. ACM SIGARCH computer architecture news, 39, 2 (2011), 1–7. https://doi.org/10.1145/2024716.2024718 Google ScholarDigital Library
- Eric Borch, Eric Tune, Srilatha Manne, and Joel Emer. 2002. Loose loops sink chips. In Proceedings Eighth International Symposium on High Performance Computer Architecture. 299–310. https://doi.org/10.1109/HPCA.2002.995719 Google ScholarCross Ref
- James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. 41–42. https://doi.org/10.1145/3185768.3185771 Google ScholarDigital Library
- Ilkwon Byun, Dongmoon Min, Gyuhyeon Lee, Seongmin Na, and Jangwoo Kim. 2021. A Next-Generation Cryogenic Processor Architecture. IEEE Micro, 41, 3 (2021), 80–86. https://doi.org/10.1109/MM.2021.3070133 Google ScholarCross Ref
- Ilkwon Byun, Dongmoon Min, Gyu-hyeon Lee, Seongmin Na, and Jangwoo Kim. 2020. CryoCore: A Fast and Dense Processor Architecture for Cryogenic Computing. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 335–348. https://doi.org/10.1109/ISCA45697.2020.00037 Google ScholarDigital Library
- William J Dally and Brian Towles. 2001. Route packets, not wires: on-chip inteconnection networks. In Proceedings of the 38th annual design automation conference. 684–689. https://doi.org/10.1145/378239.379048 Google ScholarDigital Library
- Reetuparna Das, Soumya Eachempati, Asit K Mishra, Vijaykrishnan Narayanan, and Chita R Das. 2009. Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture. 175–186. https://doi.org/10.1109/HPCA.2009.4798252 Google ScholarCross Ref
- Ronald G Dreslinski, Thomas Manville, Korey Sewell, Reetuparna Das, Nathaniel Pinckney, Sudhir Satpathy, David Blaauw, Dennis Sylvester, and Trevor Mudge. 2012. XPoint cache: Scaling existing bus-based coherence protocols for 2D and 3D many-core systems. In 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). 75–85. https://doi.org/10.1145/2370816.2370829 Google ScholarDigital Library
- Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. Acm sigplan notices, 47, 4 (2012), 37–48. https://doi.org/10.1145/2248487.2150982 Google ScholarDigital Library
- Antonio Gonzalez, Fernando Latorre, and Grigorios Magklis. 2010. Processor microarchitecture: An implementation perspective. Synthesis Lectures on Computer Architecture, 5, 1 (2010), 1–116. https://doi.org/10.2200/S00309ED1V01Y201011CAC012 Google ScholarCross Ref
- John L Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, 34, 4 (2006), 1–17.Google ScholarDigital Library
- Yatin Hoskote, Sriram Vangal, Arvind Singh, Nitin Borkar, and Shekhar Borkar. 2007. A 5-GHz mesh interconnect for a teraflops processor. IEEE micro, 27, 5 (2007), 51–61. https://doi.org/10.1109/MM.2007.4378783 Google ScholarCross Ref
- Yaoru Hou, We Ge, Yanan Guo, Lirida Naviner, You Wang, Bo Liu, Jun Yang, and Hao Cai. 2021. Cryogenic In-MRAM Computing. In 2021 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH). 1–6. https://doi.org/10.1109/NANOARCH53687.2021.9642238 Google ScholarCross Ref
- Jason Howard, Saurabh Dighe, Yatin Hoskote, Sriram Vangal, David Finan, Gregory Ruhl, David Jenkins, Howard Wilson, Nitin Borkar, and Gerhard Schrom. 2010. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In 2010 IEEE International Solid-State Circuits Conference-(ISSCC). 108–109. https://doi.org/10.1109/ISSCC.2010.5434077 Google ScholarCross Ref
- Intel.. 2021. Intel Xeon Platinum 9282 Processor.. https://ark.intel.com/content/www/us/en/ark/products/194146/intel-xeon-platinum-9282-processor-77m-cache-2-60-ghz.html [Online Accessed, 12-August-2021].Google Scholar
- Yukikazu Iwasa. 2009. Case studies in superconducting magnets: design and operational issues. Springer Science & Business Media.Google Scholar
- Nan Jiang, Daniel U Becker, George Michelogiannakis, James Balfour, Brian Towles, David E Shaw, John Kim, and William J Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In 2013 IEEE international symposium on performance analysis of systems and software (ISPASS). 86–96. https://doi.org/10.1109/ISPASS.2013.6557149 Google ScholarCross Ref
- Daniel A Jiménez. 2003. Reconsidering complex branch predictors. In The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.. 43–52. https://doi.org/10.1109/HPCA.2003.1183523 Google ScholarCross Ref
- Daniel A Jiménez, Stephen W Keckler, and Calvin Lin. 2000. The impact of delay on the design of branch predictors. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture. 67–76. https://doi.org/10.1109/MICRO.2000.898059 Google ScholarCross Ref
- Andrew B Kahng, Bin Li, Li-Shiuan Peh, and Kambiz Samadi. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In 2009 Design, Automation & Test in Europe Conference & Exhibition. 423–428. https://doi.org/10.1109/DATE.2009.5090700 Google ScholarCross Ref
- John Kim, James Balfour, and William Dally. 2007. Flattened butterfly topology for on-chip networks. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). 172–182. https://doi.org/10.1109/MICRO.2007.29 Google ScholarCross Ref
- Tushar Krishna, Chia-Hsin Owen Chen, Woo Cheol Kwon, and Li-Shiuan Peh. 2013. Breaking the on-chip latency barrier using SMART. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 378–389. https://doi.org/10.1109/HPCA.2013.6522334 Google ScholarDigital Library
- Tushar Krishna, Jacob Postman, Christopher Edmonds, Li-Shiuan Peh, and Patrick Chiang. 2010. Swift: A swing-reduced interconnect for a token-based network-on-chip in 90nm cmos. In 2010 IEEE International Conference on Computer Design. 439–446. https://doi.org/10.1109/ICCD.2010.5647666 Google ScholarCross Ref
- Rakesh Kumar, Dean M Tullsen, Norman P Jouppi, and Parthasarathy Ranganathan. 2005. Heterogeneous chip multiprocessors. Computer, 38, 11 (2005), 32–38. https://doi.org/10.1109/MC.2005.379 Google ScholarDigital Library
- Rakesh Kumar, Victor Zyuban, and Dean M Tullsen. 2005. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In 32nd International Symposium on Computer Architecture (ISCA’05). 408–419. https://doi.org/10.1109/ISCA.2005.34 Google ScholarDigital Library
- Gyu-hyeon Lee, Dongmoon Min, Ilkwon Byun, and Jangwoo Kim. 2019. Cryogenic Computer Architecture Modeling with Memory-Side Case Studies. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA ’19). Association for Computing Machinery, New York, NY, USA. 774–787. isbn:9781450366694 https://doi.org/10.1145/3307650.3322219 Google ScholarDigital Library
- Gyu-Hyeon Lee, Seongmin Na, Ilkwon Byun, Dongmoon Min, and Jangwoo Kim. 2021. CryoGuard: A near refresh-free robust DRAM design for cryogenic computing. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 637–650. https://doi.org/10.1109/ISCA52012.2021.00056 Google ScholarDigital Library
- Sheng Li, Jung Ho Ahn, Richard D Strong, Jay B Brockman, Dean M Tullsen, and Norman P Jouppi. 2009. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469–480. https://doi.org/10.1145/1669112.1669172 Google ScholarDigital Library
- Gabriel H Loh. 2006. Revisiting the performance impact of branch predictor latencies. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software. 59–69. https://doi.org/10.1109/ISPASS.2006.1620790 Google ScholarCross Ref
- William L Luyben. 2017. Estimating refrigeration costs at cryogenic temperatures. Computers & Chemical Engineering, 103 (2017), 144–150. https://doi.org/10.1016/j.compchemeng.2017.03.013Google ScholarCross Ref
- Richard Allen Matula. 1979. Electrical resistivity of copper, gold, palladium, and silver. Journal of Physical and Chemical Reference Data, 8, 4 (1979), 1147–1298. https://doi.org/10.1063/1.555614 Google ScholarCross Ref
- Dongmoon Min, Ilkwon Byun, Gyu-Hyeon Lee, Seongmin Na, and Jangwoo Kim. 2020. CryoCache: A Fast, Large, and Cost-Effective Cache Architecture for Cryogenic Computing. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA. 449–464. isbn:9781450371025 https://doi.org/10.1145/3373376.3378513 Google ScholarDigital Library
- Kaizad Mistry, C Allen, Chris Auth, Bruce Beattie, D Bergstrom, M Bost, M Brazier, M Buehler, A Cappellani, and R Chau. 2007. A 45nm logic technology with high-k+ metal gate transistors, strained silicon, 9 Cu interconnect layers, 193nm dry patterning, and 100% Pb-free packaging. In 2007 IEEE International Electron Devices Meeting. 247–250. https://doi.org/10.1109/IEDM.2007.4418914 Google ScholarCross Ref
- Naveen Muralimanohar, Rajeev Balasubramonian, and Norm Jouppi. 2007. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). 3–14. https://doi.org/10.1109/MICRO.2007.33 Google ScholarDigital Library
- Inc NanGate. 2008. NanGate FreePDK45 Open Cell Library.Google Scholar
- BA Nayfeh and K Olukotun. 1997. A single-chip multiprocessor. Computer, 30, 9 (1997), 79–85. https://doi.org/10.1109/2.612253 Google ScholarDigital Library
- Subbarao Palacharla, Norman P Jouppi, and James E Smith. 1996. Quantifying the complexity of superscalar processors. University of Wisconsin-Madison Department of Computer Sciences.Google Scholar
- Subbarao Palacharla, Norman P Jouppi, and James E Smith. 1997. Complexity-effective superscalar processors. 25, ACM. https://doi.org/10.1145/264107.264201 Google ScholarDigital Library
- Sunghyun Park, Tushar Krishna, Chia-Hsin Chen, Bhavya Daya, Anantha Chandrakasan, and Li-Shiuan Peh. 2012. Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI. In Proceedings of the 49th Annual Design Automation Conference. 398–405. https://doi.org/10.1145/2228360.2228431 Google ScholarDigital Library
- Chris H Perleberg and Alan Jay Smith. 1993. Branch target buffer design and optimization. IEEE transactions on computers, 42, 4 (1993), 396–412. https://doi.org/10.1109/12.214687 Google ScholarDigital Library
- JJ Plombon, Ebrahim Andideh, Valery M Dubin, and Jose Maiz. 2006. Influence of phonon, geometry, impurity, and grain size on copper line resistivity. Applied physics letters, 89, 11 (2006), 113124. https://doi.org/10.1063/1.2355435 Google ScholarCross Ref
- Salonik Resch, Husrev Cilasun, and Ulya Karpuzcu. 2021. Cryogenic PIM: Challenges & Opportunities. IEEE Computer Architecture Letters, https://doi.org/10.1109/LCA.2021.3077536 Google ScholarCross Ref
- Rakshith Saligram, Divya Prasad, David Pietromonaco, Arijit Raychowdhury, and Brian Cline. 2021. A 64-Bit Arm CPU at Cryogenic temperatures: Design Technology Co-Optimization for Power and Performance. In 2021 IEEE Custom Integrated Circuits Conference (CICC). 1–2. https://doi.org/10.1109/CICC51472.2021.9431559 Google ScholarCross Ref
- Tomoya Sanuki, Yuta Aiba, Hitomi Tanaka, Takashi Maeda, Keiichi Sawa, Fumie Kikushima, and Masayuki Miura. 2021. Cryogenic Operation of 3D Flash Memory for Storage Performance Improvement and Bit Cost Scaling. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, https://doi.org/10.1109/JXCDC.2021.3123783 Google ScholarCross Ref
- Oleg Semenov, Arman Vassighi, and Manoj Sachdev. 2002. Impact of technology scaling on thermal behavior of leakage current in sub-quarter micron MOSFETs: perspective of low temperature current testing. Microelectronics Journal, 33, 11 (2002), 985–994. https://doi.org/10.1016/S0026-2692(02)00071-X Google ScholarCross Ref
- André Seznec and Antony Fraboulet. 2003. Effective ahead pipelining of instruction block address generation. In 30th Annual International Symposium on Computer Architecture, 2003. Proceedings.. 241–252. https://doi.org/10.1109/ISCA.2003.1207004 Google ScholarCross Ref
- Synopsys.. 2021. Synopsys DC Ultra.. https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/dc-ultra.html [Online Accessed, 12-August-2021].Google Scholar
- Swamit S Tannu, Douglas M Carmean, and Moinuddin K Qureshi. 2017. Cryogenic-DRAM based memory system for scalable quantum computers: a feasibility study. In Proceedings of the International Symposium on Memory Systems. 189–195. https://doi.org/10.1145/3132402.3132436 Google ScholarDigital Library
- M Bedford Taylor, Walter Lee, Saman Amarasinghe, and Anant Agarwal. 2003. Scalar operand networks: On-chip interconnect for ILP in partitioned architectures. In The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.. 341–353. https://doi.org/10.1109/HPCA.2003.1183551 Google ScholarCross Ref
- Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, Henry Hoffmann, Paul Johnson, Jason Kim, and James Psota. 2004. Evaluation of the Raw microprocessor: An exposed-wire-delay architecture for ILP and streams. ACM SIGARCH Computer Architecture News, 32, 2 (2004), 2. https://doi.org/10.1145/1028176.1006733 Google ScholarDigital Library
- Hermanus JM ter Brake and GFM Wiegerinck. 2002. Low-power cryocooler survey. Cryogenics, 42, 11 (2002), 705–718. https://doi.org/10.1016/S0011-2275(02)00143-1 Google ScholarCross Ref
- Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P Jouppi. 2008. CACTI 5.1. Technical Report HPL-2008-20, HP Labs.Google Scholar
- Aniruddha N Udipi, Naveen Muralimanohar, and Rajeev Balasubramonian. 2010. Towards scalable, energy-efficient, bus-based on-chip networks. In HPCA-16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture. 1–12. https://doi.org/10.1109/HPCA.2010.5416639 Google ScholarCross Ref
- Sriram Vangal, Jason Howard, Gregory Ruhl, Saurabh Dighe, Howard Wilson, James Tschanz, David Finan, Priya Iyer, Arvind Singh, and Tiju Jacob. 2007. An 80-tile 1.28 TFLOPS network-on-chip in 65nm CMOS. In 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. 98–589. https://doi.org/10.1109/ISSCC.2007.373606 Google ScholarCross Ref
- Fiona Wang, Thomas Vogelsang, Brent Haukness, and Stephen C Magee. 2018. DRAM retention at cryogenic temperatures. In 2018 IEEE International Memory Workshop (IMW). 1–4. https://doi.org/10.1109/IMW.2018.8388826 Google ScholarCross Ref
- Wikichip.. 2021. Intel Skylake Client core floorplan.. https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client).Google Scholar
- Linda Wilson. 2013. International technology roadmap for semiconductors (ITRS). Semiconductor Industry Association, 1 (2013), https://doi.org/10.1007/978-3-642-23096-7_7 Google ScholarCross Ref
Index Terms
- CryoWire: wire-driven microarchitecture designs for cryogenic computing
Recommendations
Guest Editors' Introduction: On-Chip Interconnects for Multicores
This special issue of IEEE Micro brings readers the latest advances in the field of on-chip interconnects for multicores. The guest editors specifically selected articles to focus on novel on-chip networks realized on actual silicon--partly to showcase ...
Research Challenges for On-Chip Interconnection Networks
On-chip interconnection networks are rapidly becoming a key enabling technology for commodity multicore processors and SoCs common in consumer embedded systems. Last year, the National Science Foundation initiated a workshop that addressed upcoming ...
Bringing NoCs to 65 nm
Very deep submicron process technologies are ideal application fields for NoCs, which offer a promising solution to the scalability problem. This article sheds light on the benefits and challenges of NoC-based interconnect design in nanometer CMOS. The ...
Comments