research-article

AWGR-based optical processor-to-memory communication for low-latency, low-energy vault accesses

Authors:

Sebastian Werner,

Roberto Proietti,

S. J. Ben YooAuthors Info & Claims

MEMSYS '18: Proceedings of the International Symposium on Memory Systems

Pages 269 - 278

https://doi.org/10.1145/3240302.3240318

Published: 01 October 2018 Publication History

Abstract

Memory cubes (MCs), following the general concept of Micron's Hybrid Memory Cube (HMC), represent a promising memory architecture for high scalability and energy efficiency due to their partitioned 3D-stacked DRAM, high-speed serial links, abstract packet-switched interface, and on-die switching fabric connecting the host processor with the MC's different partitions ('vaults'). While previous studies have shown that implementing processor-to-MC links with silicon-photonic (SiP) integrated optical links offers higher energy efficiency and bandwidth density, they keep the electrical switching fabric inside the MC die and perform signal conversion prior to routing packets across the switch.

We believe that the technological limitations of electrical interconnects in terms of energy consumption, the large size of MC dies, the high radix of the on-die switch, and the bandwidth demands will all ultimately turn the on-MC switching fabric into a critical issue in terms of energy and latency. Using an integrated optical switching fabric alleviates all of these issues and allows the host processor to directly communicate with the MC vaults by exploiting wavelength routing, thereby eliminating the need for electrical switch traversal. In particular, we propose to use Arrayed Waveguide Grating Routers (AWGRs) which offer a compact SiP switching fabric with a connectivity pattern that is ideal as the on-MC switch. Our simulation results show that exploiting AWGRs and direct processor-to-vault communication reduces both MC access energy and latency by up to 40% (on average) on PARSEC/SPLASH-2 workloads.

References

[1]

Yasuhiko Arakawa, Takahiro Nakamura, Yutaka Urino, and Tomoyuki Fujita. 2013. Silicon photonics for next generation system integration platform. IEEE Communications Magazine 51, 3 (2013), 72--77.

[2]

Meisam Bahadori, Sébastien Rumley, Dessislava Nikolova, and Keren Bergman. 2016. Comprehensive design space exploration of silicon photonic interconnects. Journal of Lightwave Technology 34, 12 (2016), 2975--2987.

[3]

Scott Beamer, Krste Asanović, Christopher Batten, Ajay Joshi, and Vladimir Stojanović. 2009. Designing multi-socket systems using silicon photonics. In Proceedings of the 23rd international conference on Supercomputing (ICS). ACM, 521--522.

Digital Library

[4]

Scott Beamer, Chen Sun, Yong-Jin Kwon, Ajay Joshi, Christopher Batten, Vladimir Stojanović, and Krste Asanović. 2010. Re-architecting DRAM memory systems with monolithically integrated silicon photonics. In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 129--140.

Digital Library

[5]

Keren Bergman et al. 2016. Photonic network-on-chip design. Springer.

[6]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT). ACM, 72--81.

Digital Library

[7]

Wim Bogaerts and Shankar Kumar Selvaraja. 2011. Compact single-mode silicon hybrid rib/strip waveguide with adiabatic bends. IEEE Photonics Journal 3, 3 (2011), 422--432.

[8]

Yigit Demir and Nikos Hardavellas. 2016. SLaC: Stage laser control for a flattened butterfly network. In International Symposium on High Performance Computer Architecture (HPCA). IEEE, 321--332.

[9]

Yigit Demir, Yan Pan, Seukwoo Song, Nikos Hardavellas, John Kim, and Gokhan Memik. 2014. Galaxy: A high-performance energy-efficient multi-chip architecture using photonic interconnects. In Proceedings of the 28th ACM international conference on Supercomputing (ICS). ACM, 303--312.

Digital Library

[10]

Paolo Grani, Roberto Proietti, Venkatesh Akella, and SJ Ben Yoo. 2017. Design and Evaluation of AWGR-Based Photonic NoC Architectures for 2.5 D Integrated High Performance Computing Systems. In IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 289--300.

[11]

Paolo Grani, Roberto Proietti, Stanley Cheung, and SJ Ben Yoo. 2016. Flat-topology high-throughput compute node with AWGR-based optical-interconnects. Journal of Lightwave Technology 34, 12 (2016), 2959--2968.

[12]

Parisa Khadem Hamedani, Natalie Enright Jerger, and Shaahin Hessabi. 2014. Qut: A low-power optical network-on-chip. In Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS). IEEE, 80--87.

[13]

Wim Heirman, Trevor Carlson, and Lieven Eeckhout. 2012. Sniper: scalable and accurate parallel multi-core simulation. In HiPEAC. High-Performance and Embedded Architecture and Compilation Network of Excellence (HiPEAC), 91--94.

[14]

JEDEC. 2015. High bandwidth memory (HBM) DRAM. https://www.jedec.org/standards-documents/docs/jesd235a. {Online; accessed 03-14-2018}.

[15]

S Kamei, M Ishii, M Itoh, T Shibata, Y Inoue, and T Kitagawa. 2003. 64x 64-channel uniform-loss and cyclic-frequency arrayed-waveguide grating router module. Electronics Letters 39, 1 (2003), 83--84.

[16]

Gwangsun Kim, John Kim, Jung Ho Ahn, and Jaeha Kim. 2013. Memory-centric system interconnect design with hybrid memory cubes. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques (PACT). IEEE Press, 145--156.

Digital Library

[17]

Ashok V Krishnamoorthy, Ron Ho, Xuezhe Zheng, Herb Schwetman, Jon Lexau, Pranay Koka, GuoLiang Li, Ivan Shubin, and John E Cunningham. 2009. Computer systems based on silicon photonic interconnects. Proc. IEEE 97, 7 (2009), 1337--1361.

[18]

Micron. 2014. Hybrid Memory Cube Specification 2.1. http://www.hybridmemorycube.org/files/SiteDownloads/HMC-30G-VSR_HMCC_Specification_Rev2.1_20151105.pdf. {Online; accessed 11-14-2018}.

[19]

Micron. 2017. Hybrid Memory Cube. {Online; accessed 03-14-2018}.

[20]

Sajjad Moazeni, Sen Lin, Mark Wade, Luca Alloatti, Rajeev J Ram, Miloš Popović, and Vladimir Stojanović. 2017. A 40-Gb/s PAM-4 Transmitter Based on a Ring-Resonator Optical DAC in 45-nm SOI CMOS. IEEE Journal of Solid-State Circuits 52, 12 (2017), 3503--3516.

[21]

NVIDIA. 2017. NVIDIA Tesla V100 GPU Architecture. http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf. {Online; accessed 03-14-2018}.

[22]

Yan Pan, Prabhat Kumar, John Kim, Gokhan Memik, Yu Zhang, and Alok Choudhary. 2009. Firefly: Illuminating future network-on-chip with nanophotonics. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 429--440.

Digital Library

[23]

J Thomas Pawlowski. 2011. Hybrid memory cube: breakthrough DRAM performance with a fundamentally re-architected DRAM subsystem. In Hot Chips, Vol. 23.

[24]

Matthew Poremba, Itir Akgun, Jieming Yin, Onur Kayiran, Yuan Xie, and Gabriel H Loh. 2017. There and Back Again: Optimizing the Interconnect in Networks of Memory Cubes. In International Symposium on Computer Architecture (ISCA). ACM, 678--690.

Digital Library

[25]

Luca Ramini and Davide Bertozzi. 2012. Power efficiency of wavelength-routed optical NoC topologies for global connectivity of 3D multi-core processors. In Proceedings of the Fifth International Workshop on Network on Chip Architectures. ACM, 25--30.

Digital Library

[26]

Luca Ramini, Paolo Grani, Sandro Bartolini, and Davide Bertozzi. 2013. Contrasting wavelength-routed optical NoC topologies for power-efficient 3D-stacked multicore processors using physical-layer analysis. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE). EDA Consortium, 1589--1594.

Digital Library

[27]

Kuanping Shang, Shibnath Pathak, Chuan Qin, and SJ Ben Yoo. 2017. Low-Loss Compact Silicon Nitride Arrayed Waveguide Gratings for Photonic Integrated Circuits. IEEE Photonics Journal 9, 5 (2017), 1--5.

[28]

Dong J Shin, Kwan S Cho, Ho C Ji, Beom S Lee, Sung G Kim, Jin K Bok, Sang H Choi, Yong H Shin, Jung H Kim, Shin Y Lee, et al. 2013. Integration of silicon photonics into DRAM process. In Optical Fiber Communication Conference (OFC/NFOEC). IEEE, 1--3.

[29]

Patrick Siegl, Rainer Buchty, and Mladen Berekovic. 2016. Data-centric computing frontiers: A survey on processing-in-memory. In The International Symposium on Memory Systems (MEMSYS). ACM, 295--308.

Digital Library

[30]

Avinash Sodani. 2015. Knights landing (knl): 2nd generation intel® xeon phi processor. In IEEE Hot Chips 27 Symposium (HCS). IEEE, 1--24.

[31]

Phillip Stanley-Marbell, Victoria Caparros Cabezas, and Ronald Luijten. 2011. Pinned to the walls - Impact of packaging and application properties on the memory and power walls. In ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 51--56.

Digital Library

[32]

Tiehui Su, Guangyao Liu, Katherine E Badham, Samuel T Thurman, Richard L Kendrick, Alan Duncan, Danielle Wuchenich, Chad Ogden, Guy Chriqui, Shaoqi Feng, et al. 2018. Interferometric imaging using Si 3 N 4 photonic integrated circuits for a SPIDER imager. Optics express 26, 10 (2018), 12801--12812.

[33]

Chen Sun, Chia-Hsin Owen Chen, George Kurian, Lan Wei, Jason Miller, Anant Agarwal, Li-Shiuan Peh, and Vladimir Stojanovic. 2012. DSENT-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Sixth IEEE/ACM International Symposium on Networks-on-Chip (NoCS). IEEE, 201--210.

Digital Library

[34]

Chen Sun, Mark T Wade, Yunsup Lee, Jason S Orcutt, Luca Alloatti, Michael S Georgas, Andrew S Waterman, Jeffrey M Shainline, Rimas R Avizienis, Sen Lin, et al. 2015. Single-chip microprocessor that communicates directly using light. Nature 528, 7583 (2015), 534.

[35]

Zhehui Wang, Zhengbin Pang, Peng Yang, Jiang Xu, Xuanqi Chen, Rafael KV Maeda, Zhifei Wang, Luan HK Duong, Haoran Li, and Zhe Wang. 2017. MOCA: An inter/intra-chip optical network for memory. In Proceedings of the 54th Annual Design Automation Conference (DAC) 2017. IEEE, 1--6.

Digital Library

[36]

Ke Wen, Hang Guan, David M Calhoun, David Donofrio, and John Shalf. 2016. Silicon photonic memory interconnect for many-core architectures. In High Performance Extreme Computing Conference (HPEC). IEEE, 1--7.

[37]

Sebastian Werner, Pouya Fotouhi, Roberto Proietti, Xian Xiao, and S.J. Ben Yoo. 2018. Energy-efficient High-throughput Photonic NoCs for 2.5D Integrated Systems: A Case for AWGRs. In 12th IEEE/ACM International Symposium on Networks-on-Chip (NOCS) (forthcoming). IEEE.

[38]

Sebastian Werner, Javier Navaridas, and Mikel Luján. 2017. Designing Low-Power, Low-Latency Networks-on-Chip by Optimally Combining Electrical and Optical Links. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 265--276.

[39]

Sebastian Werner, Javier Navaridas, and Mikel Luján. 2017. Subchannel Scheduling for Shared Optical On-chip Buses. In 2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI). IEEE, 49--56.

[40]

Sebastian Werner, Javier Navaridas, and Mikel Luján. 2017. A Survey on Optical Network-on-Chip Architectures. ACM Computing Surveys (CSUR) 50, 6 (2017), 89.

Digital Library

[41]

Business Wire. 2015. Hybrid Memory Cube (HMC) and High-bandwidth Memory (HBM Global Market Report (2018--2023)). https://www.businesswire.com/news/home/20180312005484/en/Hybrid-Memory-Cube-HMC-High-bandwidth-Memory-HBM. {Online; accessed 03-14-2018}.

[42]

Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In ACM SIGARCH Computer Architecture News. ACM, 24--36.

Digital Library

[43]

Jia Zhan, Itir Akgun, Jishen Zhao, Al Davis, Paolo Faraboschi, Yuangang Wang, and Yuan Xie. 2016. A unified memory network architecture for in-memory computing in commodity servers. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1--14.

Digital Library

Cited By

Fariborz MSamani MFotouhi PProietti RYi IAkella VLowe-Power JPalermo SYoo S(2022)LLM: Realizing Low-Latency Memory by Exploiting Embedded Silicon Photonics for Irregular WorkloadsHigh Performance Computing10.1007/978-3-031-07312-0_3(44-64)Online publication date: 29-May-2022
https://dl.acm.org/doi/10.1007/978-3-031-07312-0_3
Lin RCheng YAndrade MWosinska LChen J(2020)Disaggregated Data Centers: Challenges and Trade-offsIEEE Communications Magazine10.1109/MCOM.001.190061258:2(20-26)Online publication date: Feb-2020
https://doi.org/10.1109/MCOM.001.1900612
Fotouhi PWerner SLowe-Power JYoo S(2019)Enabling scalable chiplet-based uniform memory architectures with silicon photonicsProceedings of the International Symposium on Memory Systems10.1145/3357526.3357564(222-334)Online publication date: 30-Sep-2019
https://dl.acm.org/doi/10.1145/3357526.3357564
Show More Cited By

Index Terms

AWGR-based optical processor-to-memory communication for low-latency, low-energy vault accesses
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures
2. Hardware
  1. Emerging technologies
    1. Emerging optical and photonic technologies

Recommendations

Re-architecting DRAM memory systems with monolithically integrated silicon photonics
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

The performance of future manycore processors will only scale with the number of integrated cores if there is a corresponding increase in memory bandwidth. Projected scaling of electrical DRAM architectures appears unlikely to suffice, being constrained ...
Re-architecting DRAM memory systems with monolithically integrated silicon photonics
ISCA '10

The performance of future manycore processors will only scale with the number of integrated cores if there is a corresponding increase in memory bandwidth. Projected scaling of electrical DRAM architectures appears unlikely to suffice, being constrained ...
XYZ-Randomization using TSVs for Low-Latency Energy Efficient 3D-NoCs
NOCS '17: Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip

In this paper, we propose a method to design low latency and low energy networks for 3D Network-on-Chip (3D-NoC). Recent many-core processors require low-latency interconnection networks since the increasing number of cores limits the network ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '18: Proceedings of the International Symposium on Memory Systems

October 2018

361 pages

ISBN:9781450364751

DOI:10.1145/3240302

General Chair:
Bruce Jacob
University of Maryland

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MEMSYS '18

MEMSYS '18: The International Symposium on Memory Systems

October 1 - 4, 2018

Virginia, Alexandria, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
191
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fariborz MSamani MFotouhi PProietti RYi IAkella VLowe-Power JPalermo SYoo S(2022)LLM: Realizing Low-Latency Memory by Exploiting Embedded Silicon Photonics for Irregular WorkloadsHigh Performance Computing10.1007/978-3-031-07312-0_3(44-64)Online publication date: 29-May-2022
https://dl.acm.org/doi/10.1007/978-3-031-07312-0_3
Lin RCheng YAndrade MWosinska LChen J(2020)Disaggregated Data Centers: Challenges and Trade-offsIEEE Communications Magazine10.1109/MCOM.001.190061258:2(20-26)Online publication date: Feb-2020
https://doi.org/10.1109/MCOM.001.1900612
Fotouhi PWerner SLowe-Power JYoo S(2019)Enabling scalable chiplet-based uniform memory architectures with silicon photonicsProceedings of the International Symposium on Memory Systems10.1145/3357526.3357564(222-334)Online publication date: 30-Sep-2019
https://dl.acm.org/doi/10.1145/3357526.3357564
Werner SFotouhi PXiao XFariborz MYoo SMichelogiannakis GVasudevan D(2019)3D photonics as enabling technology for deep 3D DRAM stackingProceedings of the International Symposium on Memory Systems10.1145/3357526.3357559(206-221)Online publication date: 30-Sep-2019
https://dl.acm.org/doi/10.1145/3357526.3357559

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten