skip to main content
10.1145/3330345.3330381acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

DeepHiR: improving high-radix router throughput with deep hybrid memory buffer microarchitecture

Published: 26 June 2019 Publication History

Abstract

Hierarchical high-radix router microarchitecture consisting of small SRAM-based intermediate buffers have been used in large-scale supercomputers interconnection networks. While hierarchical organization enables efficient scaling to higher switch port count, it requires intermediate buffers that can cause performance bottleneck. Shallow intermediate buffers can cause head-of-line blocking and result in backpressure towards the input buffers to reduce overall performance. Increasing intermediate buffer size overcomes this problem but is infeasible since the amount of intermediate buffer is proportional to O(p2) where p is the router radix. Adopting new memory technology with higher density can increase intermediate buffer size but is not practical in decentralized, small-size intermediate buffers.
In this work, we propose to organise the decentralized intermediate buffers as centralized buffers and leverage alternate memory technology to increase the buffer capacity for high-radix routers. In particular, we exploit Spin-Torque Transfer Magnetic RAM (STT-MRAM) to provide high-density and increase intermediate buffer depths while providing near-zero leakage power. STT-MRAM does result in significant overhead with larger amount of write/read ports that is necessary to support speedup. To overcome this cost, we propose DeepHiR, a novel deep hybrid buffer organization (STT-MRAM and SRAM) combined with a centralized buffer organization to provide high performance with minimal cost.

References

[1]
Jung Ho Ahn, Sungwoo Choo, and John Kim. 2012. Network within a network approach to create a scalable high-radix router microarchitecture. In Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture (HPCA). 1--12.
[2]
Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive Random Access Memory (ReRAM) Based on Metal Oxides. In Proceedings of the IEEE, Vol. 98. 2237--2251.
[3]
Robert Alverson, Duncan Roweth, and Larry Kaplan. 2010. The gemini system interconnect. In Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects (HOTI). 83--87.
[4]
D. H. Bailey, E. Barszcz, H. D. Simon, V. Venkatakrishnan, S. K. Weeratunga, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, and R. A. Fatoohi. 1991. The NAS parallel benchmarks summary and preliminary results. In Proceedings of the 5th ACM/IEEE Conference on Supercomputing (ICS). 158--165.
[5]
Mu-Tien Chang, Paul Rosenfeld, Shih-Lien Lu, and Bruce Jacob. 2013. Technology comparison for large last-level caches (L 3 C-s): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. In Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA). 143--154.
[6]
Nikolaos Chrysos and Manolis Katevenis. 2006. Scheduling in Non-Blocking Buffered Three-Stage Switching Fabrics. In Proceedings of the 25th IEEE International Conference on Computer Communications (INFOCOM). 1--13.
[7]
Nikolaos Chrysos, Cyriel Minkenberg, Mark Rudquist, Claude Basso, and Brian Vanderpool. 2015. Scoc: High-radix switches made of bufferless clos networks. In Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture (HPCA). 402--414.
[8]
Ki Chul Chun, Hui Zhao, Jonathan D Harms, Tae-Hyoung Kim, Jian-Ping Wang, and Chris H Kim. 2013. A scaling roadmap and performance evaluation of in-plane and perpendicular MTJ based STT-MRAMs for high-density cache memory. IEEE Journal of Solid-State Circuits 48, 2 (2013), 598--610.
[9]
Yi Dai, Kai Lu, Liquan Xiao, and Jinshu Su. 2019. A Cost-efficient Router Architecture for HPC Interconnection Networks: Design and Implementation. IEEE Transactions on Parallel and Distributed Systems 30, 4 (2019), 738--753.
[10]
Yi Dai, Kefei Wang, Qu Gang, Liquan Xiao, Dezun Dong, and Xingyun Qi. 2017. A Scalable and Resilient Microarchitecture Based on Multiport Binding for High-Radix Router Design. In Proceedings of the 31st IEEE Parallel and Distributed Processing Symposium (IPDPS). 8--17.
[11]
William. J Dally. 1990. Virtual-channel flow control. In Proceedings of the 17th IEEE International Symposium on Computer Architecture (ISCA). 60--68.
[12]
William James Dally and Brian Patrick Towles. 2004. Principles and practices of interconnection networks. Elsevier.
[13]
Xiangyu Dong, Cong Xu, Norm Jouppi, and Yuan Xie. 2014. N-VSim: A circuit-level performance, energy, and area model for emerging non-volatile memory. In Emerging Memory Technologies. Springer, 15--50.
[14]
Greg Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, and James Reinhard. 2012. Cray cascade: a scalable HPC system based on a Dragonfly network. In Proceedings of the 25th ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC). 103.
[15]
Hyunjun Jang, Baik Song An, Nikhil Kulkarni, Ki Hwan Yum, and Eun Jung Kim. 2012. A hybrid buffer design with STT-MRAM for on-chip interconnects. In Proceedings of the Sixth IEEE/ACM International Symposium on Networks on Chip (NoCS). 193--200.
[16]
Nan Jiang, Daniel U Becker, George Michelogiannakis, James Balfour, Brian Towles, David E Shaw, John Kim, and William J Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In Proceedings of the 14th IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 86--96.
[17]
Adwait Jog, Asit K Mishra, Cong Xu, Yuan Xie, Vijaykrishnan Narayanan, Ravishankar Iyer, and Chita R Das. 2012. Cache revive: architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the 49th ACM Annual Design Automation Conference (DAC). 243--252.
[18]
Gwangsun Kim, Changhyun Kim, Jiyun Jeong, Mike Parker, and John Kim. 2016. Contention-based congestion management in large-scale networks. In Proceedings of the 49th IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13.
[19]
John Kim, William J Dally, and Dennis Abts. 2007. Flattened butterfly: a cost-efficient topology for high-radix networks. In Proceedings of the 34th ACM International Symposium on Computer Architecture (ISCA). 126--137.
[20]
John Kim, William J Dally, Brian Towles, and Amit K Gupta. 2005. Microarchitecture of a high radix router. In Proceedings of the 32nd IEEE International Symposium on Computer Architecture (ISCA). 420--431.
[21]
Emre Kültürsay, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu. 2013. Evalating STT-RAM as an energy-efficient main memory alternative. In Proceedings of the 14th IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 256--267.
[22]
Fei Lei, Dezun Dong, Xiangke Liao, Xing Su, and Cunlu Li. 2016. Galaxyfly: A novel family of flexible-radix low-diameter topologies for large-scales interconnection networks. In Proceedings of the 30th ACM/IEEE International Conference on Supercomputing (ICS). 24.
[23]
Xiang-Ke Liao, Zheng-Bin Pang, Ke-Fei Wang, Yu-Tong Lu, Min Xie, Jun Xia, De-Zun Dong, and Guang Suo. 2015. High performance interconnect network for Tianhe system. Journal of Computer Science and Technology 30, 2 (2015), 259.
[24]
T Ohsawa, S Miura, K Kinoshita, H Honjo, S Ikeda, T Hanyu, H Ohno, and T Endoh. 2013. A 1.5 nsec/2.1 nsec random read/write cycle 1Mb STT-RAM using 6T2MTJ cell with background write for nonvolatile e-memories. In Proceedings of the IEEE Symposium on VLSI Circuits (VLSIC). C110--C111.
[25]
Giorgos Passas, Manolis Katevenis, and Dionisios Pnevmatikatos. 2015. The Combined Input-Output Queued Crossbar Architecture for High-Radix On-Chip Switches. IEEE Transactions on Micro 35, 6 (2015), 38--47.
[26]
Howard Pritchard, Duncan Roweth, David Henseler, and Paul Cassella. 2012. Leveraging the Cray Linux Environment Core Specialization feature to realize MPI asynchronous progress on Cray XE systems. In Proceedings of the Cray User Group Conference.
[27]
Steve Scott, Dennis Abts, John Kim, and William J Dally. 2006. The blackwidow high-radix clos network. ACM SIGARCH Computer Architecture News 34, 2 (2006), 16--28.
[28]
Chen Sun, Chia Hsin Owen Chen, George Kurian, Lan Wei, Jason Miller, Anant Agarwal, Li Shiuan Peh, and Vladimir Stojanovic. 2012. DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling. In Proceedings of the Sixth IEEE/ACM International Symposium on Networks on Chip (NoCs). 201--210.
[29]
Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the 15th IEEE International Symposium on High Performance Computer Architecture (HPCA). 239--249.
[30]
Min Xie, Yutong Lu, Kefei Wang, Lu Liu, Hongjia Cao, and Xuejun Yang. 2012. Tianhe-1a interconnect and message-passing services. IEEE Transactions on Micro 32, 1 (2012), 8--20.
[31]
Jia Zhan, Jin Ouyang, Fen Ge, Jishen Zhao, and Yuan Xie. 2016. Hybrid drowsy SRAM and STT-RAM buffer designs for dark-silicon-aware NoC. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 10 (2016), 3041--3054.

Cited By

View all
  • (2023)VVQ: Virtualizing Virtual Channel for Cost-Efficient Protocol Deadlock Avoidance2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071059(1072-1084)Online publication date: Feb-2023
  • (2022)Hybrid Memory Buffer Microarchitecture for High-Radix RoutersIEEE Transactions on Computers10.1109/TC.2021.307643171:11(2888-2902)Online publication date: 1-Nov-2022
  • (2021)CIB-HIERACM Transactions on Architecture and Code Optimization10.1145/346806218:4(1-21)Online publication date: 17-Jul-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '19: Proceedings of the ACM International Conference on Supercomputing
June 2019
533 pages
ISBN:9781450360791
DOI:10.1145/3330345
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • National Science and Technology Major Projects on Core Electronic Devices, High-End Generic Chips and Basic Software

Conference

ICS '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)VVQ: Virtualizing Virtual Channel for Cost-Efficient Protocol Deadlock Avoidance2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071059(1072-1084)Online publication date: Feb-2023
  • (2022)Hybrid Memory Buffer Microarchitecture for High-Radix RoutersIEEE Transactions on Computers10.1109/TC.2021.307643171:11(2888-2902)Online publication date: 1-Nov-2022
  • (2021)CIB-HIERACM Transactions on Architecture and Code Optimization10.1145/346806218:4(1-21)Online publication date: 17-Jul-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media