ABSTRACT
Memory cubes (MCs), following the general concept of Micron's Hybrid Memory Cube (HMC), represent a promising memory architecture for high scalability and energy efficiency due to their partitioned 3D-stacked DRAM, high-speed serial links, abstract packet-switched interface, and on-die switching fabric connecting the host processor with the MC's different partitions ('vaults'). While previous studies have shown that implementing processor-to-MC links with silicon-photonic (SiP) integrated optical links offers higher energy efficiency and bandwidth density, they keep the electrical switching fabric inside the MC die and perform signal conversion prior to routing packets across the switch.
We believe that the technological limitations of electrical interconnects in terms of energy consumption, the large size of MC dies, the high radix of the on-die switch, and the bandwidth demands will all ultimately turn the on-MC switching fabric into a critical issue in terms of energy and latency. Using an integrated optical switching fabric alleviates all of these issues and allows the host processor to directly communicate with the MC vaults by exploiting wavelength routing, thereby eliminating the need for electrical switch traversal. In particular, we propose to use Arrayed Waveguide Grating Routers (AWGRs) which offer a compact SiP switching fabric with a connectivity pattern that is ideal as the on-MC switch. Our simulation results show that exploiting AWGRs and direct processor-to-vault communication reduces both MC access energy and latency by up to 40% (on average) on PARSEC/SPLASH-2 workloads.
- Yasuhiko Arakawa, Takahiro Nakamura, Yutaka Urino, and Tomoyuki Fujita. 2013. Silicon photonics for next generation system integration platform. IEEE Communications Magazine 51, 3 (2013), 72--77.Google ScholarCross Ref
- Meisam Bahadori, Sébastien Rumley, Dessislava Nikolova, and Keren Bergman. 2016. Comprehensive design space exploration of silicon photonic interconnects. Journal of Lightwave Technology 34, 12 (2016), 2975--2987.Google ScholarCross Ref
- Scott Beamer, Krste Asanović, Christopher Batten, Ajay Joshi, and Vladimir Stojanović. 2009. Designing multi-socket systems using silicon photonics. In Proceedings of the 23rd international conference on Supercomputing (ICS). ACM, 521--522. Google ScholarDigital Library
- Scott Beamer, Chen Sun, Yong-Jin Kwon, Ajay Joshi, Christopher Batten, Vladimir Stojanović, and Krste Asanović. 2010. Re-architecting DRAM memory systems with monolithically integrated silicon photonics. In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 129--140. Google ScholarDigital Library
- Keren Bergman et al. 2016. Photonic network-on-chip design. Springer.Google Scholar
- Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques (PACT). ACM, 72--81. Google ScholarDigital Library
- Wim Bogaerts and Shankar Kumar Selvaraja. 2011. Compact single-mode silicon hybrid rib/strip waveguide with adiabatic bends. IEEE Photonics Journal 3, 3 (2011), 422--432.Google ScholarCross Ref
- Yigit Demir and Nikos Hardavellas. 2016. SLaC: Stage laser control for a flattened butterfly network. In International Symposium on High Performance Computer Architecture (HPCA). IEEE, 321--332.Google ScholarCross Ref
- Yigit Demir, Yan Pan, Seukwoo Song, Nikos Hardavellas, John Kim, and Gokhan Memik. 2014. Galaxy: A high-performance energy-efficient multi-chip architecture using photonic interconnects. In Proceedings of the 28th ACM international conference on Supercomputing (ICS). ACM, 303--312. Google ScholarDigital Library
- Paolo Grani, Roberto Proietti, Venkatesh Akella, and SJ Ben Yoo. 2017. Design and Evaluation of AWGR-Based Photonic NoC Architectures for 2.5 D Integrated High Performance Computing Systems. In IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 289--300.Google Scholar
- Paolo Grani, Roberto Proietti, Stanley Cheung, and SJ Ben Yoo. 2016. Flat-topology high-throughput compute node with AWGR-based optical-interconnects. Journal of Lightwave Technology 34, 12 (2016), 2959--2968.Google ScholarCross Ref
- Parisa Khadem Hamedani, Natalie Enright Jerger, and Shaahin Hessabi. 2014. Qut: A low-power optical network-on-chip. In Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS). IEEE, 80--87.Google ScholarCross Ref
- Wim Heirman, Trevor Carlson, and Lieven Eeckhout. 2012. Sniper: scalable and accurate parallel multi-core simulation. In HiPEAC. High-Performance and Embedded Architecture and Compilation Network of Excellence (HiPEAC), 91--94.Google Scholar
- JEDEC. 2015. High bandwidth memory (HBM) DRAM. https://www.jedec.org/standards-documents/docs/jesd235a. {Online; accessed 03-14-2018}.Google Scholar
- S Kamei, M Ishii, M Itoh, T Shibata, Y Inoue, and T Kitagawa. 2003. 64x 64-channel uniform-loss and cyclic-frequency arrayed-waveguide grating router module. Electronics Letters 39, 1 (2003), 83--84.Google ScholarCross Ref
- Gwangsun Kim, John Kim, Jung Ho Ahn, and Jaeha Kim. 2013. Memory-centric system interconnect design with hybrid memory cubes. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques (PACT). IEEE Press, 145--156. Google ScholarDigital Library
- Ashok V Krishnamoorthy, Ron Ho, Xuezhe Zheng, Herb Schwetman, Jon Lexau, Pranay Koka, GuoLiang Li, Ivan Shubin, and John E Cunningham. 2009. Computer systems based on silicon photonic interconnects. Proc. IEEE 97, 7 (2009), 1337--1361.Google ScholarCross Ref
- Micron. 2014. Hybrid Memory Cube Specification 2.1. http://www.hybridmemorycube.org/files/SiteDownloads/HMC-30G-VSR_HMCC_Specification_Rev2.1_20151105.pdf. {Online; accessed 11-14-2018}.Google Scholar
- Micron. 2017. Hybrid Memory Cube. {Online; accessed 03-14-2018}.Google Scholar
- Sajjad Moazeni, Sen Lin, Mark Wade, Luca Alloatti, Rajeev J Ram, Miloš Popović, and Vladimir Stojanović. 2017. A 40-Gb/s PAM-4 Transmitter Based on a Ring-Resonator Optical DAC in 45-nm SOI CMOS. IEEE Journal of Solid-State Circuits 52, 12 (2017), 3503--3516.Google ScholarCross Ref
- NVIDIA. 2017. NVIDIA Tesla V100 GPU Architecture. http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf. {Online; accessed 03-14-2018}.Google Scholar
- Yan Pan, Prabhat Kumar, John Kim, Gokhan Memik, Yu Zhang, and Alok Choudhary. 2009. Firefly: Illuminating future network-on-chip with nanophotonics. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 429--440. Google ScholarDigital Library
- J Thomas Pawlowski. 2011. Hybrid memory cube: breakthrough DRAM performance with a fundamentally re-architected DRAM subsystem. In Hot Chips, Vol. 23.Google Scholar
- Matthew Poremba, Itir Akgun, Jieming Yin, Onur Kayiran, Yuan Xie, and Gabriel H Loh. 2017. There and Back Again: Optimizing the Interconnect in Networks of Memory Cubes. In International Symposium on Computer Architecture (ISCA). ACM, 678--690. Google ScholarDigital Library
- Luca Ramini and Davide Bertozzi. 2012. Power efficiency of wavelength-routed optical NoC topologies for global connectivity of 3D multi-core processors. In Proceedings of the Fifth International Workshop on Network on Chip Architectures. ACM, 25--30. Google ScholarDigital Library
- Luca Ramini, Paolo Grani, Sandro Bartolini, and Davide Bertozzi. 2013. Contrasting wavelength-routed optical NoC topologies for power-efficient 3D-stacked multicore processors using physical-layer analysis. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE). EDA Consortium, 1589--1594. Google ScholarDigital Library
- Kuanping Shang, Shibnath Pathak, Chuan Qin, and SJ Ben Yoo. 2017. Low-Loss Compact Silicon Nitride Arrayed Waveguide Gratings for Photonic Integrated Circuits. IEEE Photonics Journal 9, 5 (2017), 1--5.Google ScholarCross Ref
- Dong J Shin, Kwan S Cho, Ho C Ji, Beom S Lee, Sung G Kim, Jin K Bok, Sang H Choi, Yong H Shin, Jung H Kim, Shin Y Lee, et al. 2013. Integration of silicon photonics into DRAM process. In Optical Fiber Communication Conference (OFC/NFOEC). IEEE, 1--3.Google ScholarCross Ref
- Patrick Siegl, Rainer Buchty, and Mladen Berekovic. 2016. Data-centric computing frontiers: A survey on processing-in-memory. In The International Symposium on Memory Systems (MEMSYS). ACM, 295--308. Google ScholarDigital Library
- Avinash Sodani. 2015. Knights landing (knl): 2nd generation intel® xeon phi processor. In IEEE Hot Chips 27 Symposium (HCS). IEEE, 1--24.Google ScholarCross Ref
- Phillip Stanley-Marbell, Victoria Caparros Cabezas, and Ronald Luijten. 2011. Pinned to the walls - Impact of packaging and application properties on the memory and power walls. In ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 51--56. Google ScholarDigital Library
- Tiehui Su, Guangyao Liu, Katherine E Badham, Samuel T Thurman, Richard L Kendrick, Alan Duncan, Danielle Wuchenich, Chad Ogden, Guy Chriqui, Shaoqi Feng, et al. 2018. Interferometric imaging using Si 3 N 4 photonic integrated circuits for a SPIDER imager. Optics express 26, 10 (2018), 12801--12812.Google Scholar
- Chen Sun, Chia-Hsin Owen Chen, George Kurian, Lan Wei, Jason Miller, Anant Agarwal, Li-Shiuan Peh, and Vladimir Stojanovic. 2012. DSENT-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Sixth IEEE/ACM International Symposium on Networks-on-Chip (NoCS). IEEE, 201--210. Google ScholarDigital Library
- Chen Sun, Mark T Wade, Yunsup Lee, Jason S Orcutt, Luca Alloatti, Michael S Georgas, Andrew S Waterman, Jeffrey M Shainline, Rimas R Avizienis, Sen Lin, et al. 2015. Single-chip microprocessor that communicates directly using light. Nature 528, 7583 (2015), 534.Google Scholar
- Zhehui Wang, Zhengbin Pang, Peng Yang, Jiang Xu, Xuanqi Chen, Rafael KV Maeda, Zhifei Wang, Luan HK Duong, Haoran Li, and Zhe Wang. 2017. MOCA: An inter/intra-chip optical network for memory. In Proceedings of the 54th Annual Design Automation Conference (DAC) 2017. IEEE, 1--6. Google ScholarDigital Library
- Ke Wen, Hang Guan, David M Calhoun, David Donofrio, and John Shalf. 2016. Silicon photonic memory interconnect for many-core architectures. In High Performance Extreme Computing Conference (HPEC). IEEE, 1--7.Google ScholarCross Ref
- Sebastian Werner, Pouya Fotouhi, Roberto Proietti, Xian Xiao, and S.J. Ben Yoo. 2018. Energy-efficient High-throughput Photonic NoCs for 2.5D Integrated Systems: A Case for AWGRs. In 12th IEEE/ACM International Symposium on Networks-on-Chip (NOCS) (forthcoming). IEEE.Google Scholar
- Sebastian Werner, Javier Navaridas, and Mikel Luján. 2017. Designing Low-Power, Low-Latency Networks-on-Chip by Optimally Combining Electrical and Optical Links. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 265--276.Google ScholarCross Ref
- Sebastian Werner, Javier Navaridas, and Mikel Luján. 2017. Subchannel Scheduling for Shared Optical On-chip Buses. In 2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI). IEEE, 49--56.Google Scholar
- Sebastian Werner, Javier Navaridas, and Mikel Luján. 2017. A Survey on Optical Network-on-Chip Architectures. ACM Computing Surveys (CSUR) 50, 6 (2017), 89. Google ScholarDigital Library
- Business Wire. 2015. Hybrid Memory Cube (HMC) and High-bandwidth Memory (HBM Global Market Report (2018--2023)). https://www.businesswire.com/news/home/20180312005484/en/Hybrid-Memory-Cube-HMC-High-bandwidth-Memory-HBM. {Online; accessed 03-14-2018}.Google Scholar
- Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In ACM SIGARCH Computer Architecture News. ACM, 24--36. Google ScholarDigital Library
- Jia Zhan, Itir Akgun, Jishen Zhao, Al Davis, Paolo Faraboschi, Yuangang Wang, and Yuan Xie. 2016. A unified memory network architecture for in-memory computing in commodity servers. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1--14. Google ScholarDigital Library
Index Terms
- AWGR-based optical processor-to-memory communication for low-latency, low-energy vault accesses
Recommendations
Re-architecting DRAM memory systems with monolithically integrated silicon photonics
ISCA '10: Proceedings of the 37th annual international symposium on Computer architectureThe performance of future manycore processors will only scale with the number of integrated cores if there is a corresponding increase in memory bandwidth. Projected scaling of electrical DRAM architectures appears unlikely to suffice, being constrained ...
Re-architecting DRAM memory systems with monolithically integrated silicon photonics
ISCA '10The performance of future manycore processors will only scale with the number of integrated cores if there is a corresponding increase in memory bandwidth. Projected scaling of electrical DRAM architectures appears unlikely to suffice, being constrained ...
XYZ-Randomization using TSVs for Low-Latency Energy Efficient 3D-NoCs
NOCS '17: Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-ChipIn this paper, we propose a method to design low latency and low energy networks for 3D Network-on-Chip (3D-NoC). Recent many-core processors require low-latency interconnection networks since the increasing number of cores limits the network ...
Comments