ABSTRACT
At the era of big data, there have been growing demands for server memory capacity and performance. Memory network is a promising alternative to provide high bandwidth and low latency through distributed memory nodes connected by high speed interconnect. However, most of them implement the design from a pure-logic-level and ignore the physical impact from network interconnect latency, processor placement and the interplay between processor and memory. In this work, we propose a Physical-Aware framework for memory network design space exploration, which facilitates the design of an energy efficient and physical-aware memory network system. Experimental results on various workloads show that the proposed framework can help customize network topology with significant improvements on various design metrics when compared to the other commonly used topologies.
- Subramanian S Iyer. Heterogeneous integration for performance and scaling. IEEE Transactions on Components, Packaging and Manufacturing Technology, 6(7):973--982, 2016.Google ScholarCross Ref
- Seokin Hong, Prashant Jayaprakash Nair, Bulent Abali, Alper Buyuktosunoglu, Kyu-Hyoun Kim, and Michael Healy. Attache: Towards ideal memory compression by mitigating metadata bandwidth overheads. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 326--338, 2018.Google ScholarDigital Library
- Cheng Zhuo, Shaoheng Luo, Houlex Gan, Jiang Hu, and Zhiguo Shi. Noise-aware DVFS for efficient transitions on battery-powered iot devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(7):1498--1510, 2020.Google ScholarDigital Library
- Umamaheswara Rao Tida, Cheng Zhuo, Liu Leibo, and Yiyu Shi. Dynamic frequency scaling aware opportunistic through-silicon-via inductor utilization in resonant clocking. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(2):281--293, 2020.Google ScholarCross Ref
- Salem Abdennadher, Michael Altmann, and Bin Xue. Challenges and emerging solutions in testing hbm io & systems. In IEEE Latin-American Test Symposium (LATS), pages 1--4, 2018.Google ScholarCross Ref
- Yang Zhang, Dan Feng, Zhipeng Tan, Jingning Liu, Wei Tong, and Chengning Wang. Asymmetric-reram: A low latency and high reliability crossbar resistive memory architecture. In IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pages 330--337, 2018.Google Scholar
- Mauro Pelucchi, Giuseppe Psaila, and Maurizio Toccu. Hadoop vs. Spark: Impact on performance of the hammer query engine for open data corpora. Algorithms, 11(12):209, 2018.Google ScholarCross Ref
- Gleari Matheus, Yu Ye, Qian Chen, L. Miller Ethan, and Zhao Jishen. String Figure: A scalable and elastic memory network architecture. In IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 647--660, 2019.Google Scholar
- Jianing Deng, Zhiguo Shi, and Cheng Zhuo. Energy efficient real-time UAV object detection on embedded platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(10):3123--3127, 2020.Google ScholarCross Ref
- Di Gao, Dayane Reis, Xiaobo Sharon Hu, and Cheng Zhuo. Eva-cim: A system-level performance and energy evaluation framework for computing-in-memory architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(12):5011--5024, 2020.Google ScholarCross Ref
- Pier Stanislao Paolucci, Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Michele Martinelli, Elena Pastorelli, Francesco Simula, and Piero Vicini. Power, energy and speed of embedded and server multi-cores applied to distributed simulation of spiking neural networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores. CoRR, abs/1505.03015, 2015.Google Scholar
- Haohuan Fu, Junfeng Liao, Jinzhe Yang, Lanning Wang, Zhenya Song, Xiaomeng Huang, Chao Yang, Wei Xue, Fangfang Liu, and Fangli Qiao. The sunway taihulight supercomputer: system and applications. Science China Information Sciences, 59(7):072001, 2016.Google ScholarCross Ref
- Keley Mohammad, Khademzadeh Ahmad, and Hosseinzadeh Mehdi. Efficient mapping algorithm on mesh-based NoCs in terms of cellular learning automata. Int. Arab J. Inf. Technol., 16(2):312--322, 2019.Google Scholar
- Hossein Farrokhbakht, Hadi Mardani Kamali, and Shaahin Hessabi. SMART: A scalable mapping and routing technique for power-gating in noc routers. In IEEE/ACM International Symposium on Networks-on-Chip (NOCS), volume 15, pages 1--8, 2017.Google Scholar
- Abbas Dehghani and Keyvan RahimiZadeh. Design and performance evaluation of mesh-of-tree-based hierarchical wireless network-on-chip for multicore systems. Journal of Parallel and Distributed Computing, 123:100--117, 2019.Google ScholarCross Ref
- Bishnoi Rimpy, Laxmi Vijay, Manoj Singh Gaur, and Mark Zwolinski. Resilient routing implementation in 2D mesh NoC. Microelectronics Reliability, 56:189--201, 2016.Google ScholarCross Ref
- John Kim, James Balfour, and William Dally. Flattened butterfly topology for on-chip networks. In IEEE/ACM International Symposium on Microarchitecture (MICRO), volume 6, pages 172--182, 2007.Google ScholarCross Ref
- John Kim, Wiliam J Dally, Steve Scott, and Dennis Abts. Technology-driven, highly-scalable dragonfly topology. In International Symposium on Computer Architecture (ISCA), pages 77--88, 2008.Google ScholarDigital Library
- Hesam Shabani and Xiaochen Guo. Cluscross: a new topology for silicon interposer-based Network-on-Chip. In IEEE/ACM International Symposium on Networks-on-Chip, volume 7, pages 1--8, 2019.Google ScholarDigital Library
- Shpiner Alexander, Haramaty Zachy, Eliad Saar, Zdornov Vladimir, Gafni Barak, and Zahavi Eitan. Dragonfly+: Low cost topology for scaling datacenters. In IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), pages 1--8, 2017.Google Scholar
- Nathan L. Binkert, Bradford M. Beckmann, Gabriel Black, Steven K. Reinhardt, Ali G. Saidi, Arkaprava Basu, Joel Hestness, Derek Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib Bin Altaf, Nilay Vaish, Mark D. Hill, and David A. Wood. The gem5 simulator. SIGARCH Computer Architecture News, 39(2):1--7, 2011.Google ScholarDigital Library
- Niket Agarwal, Tushar Krishna, Li-Shiuan Peh, and Niraj K. Jha. GARNET: A detailed on-chip network model inside a full-system simulator. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 33--42, 2009.Google ScholarCross Ref
- Ali Dorri, Salil S Kanhere, and Raja Jurdak. MOF-BC: A memory optimized and flexible blockchain for large scale networks. Future Generation Computer Systems, 92:357--373, 2019.Google ScholarCross Ref
- Akbar Sharifi, Emre Kultursay, Mahmut T. Kandemir, and Chita R. Das. Addressing end-to-end memory access latency in noc-based multicores. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 294--304, 2012.Google ScholarDigital Library
- Amro Awad and Yan Solihin. STM: cloning the spatial and temporal memory access behavior. In IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 237--247, 2014.Google ScholarCross Ref
- Bei Zhou, Yongzhong Huang, Jinchen Xu, Shaozhong Guo, and Hongyuan Qi. Memory latency optimizations for the elementary functions on the sunway architecture. The Journal of Supercomputing, 75(7):3917--3944, 2019.Google ScholarDigital Library
- Shuang Wang, Jianzhong Huang, Xiao Qin, Qiang Cao, and Changsheng Xie. WPS: A workload-aware placement scheme for erasure-coded in-memory stores. In International Conference on Networking, Architecture, and Storage (NAS), pages 1--10, 2017.Google ScholarCross Ref
- Shixiong Qi, Huaxi Gu, Haibo Zhang, and Yawen Chen. Testudo: A low latency and High-Efficient Memory-Centric Network using optical interconnect. In IEEE Global Communications Conference (GLOBECOM), pages 1--6, 2017.Google ScholarCross Ref
- Subodha Charles, Alif Ahmed, Ümit Y. Ogras, and Prabhat Mishra. Efficient cache reconfiguration using machine learning in NoC-based Many-Core CMPs. ACM Transactions on Design Automation of Electronic Systems, 24(6):60:1--60:23, 2019.Google ScholarDigital Library
- Cheng Zhuo, Kassan Unda, Yiyu Shi, and Wei-Kai Shih. From layout to system: Early stage power delivery and architecture co-exploration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38(7):1291--1304, 2018.Google ScholarCross Ref
- Ye Yu and Chen Qian. Space Shuffle: A scalable, flexible, and high-bandwidth data center network. In IEEE International Conference on Network Protocols (ICNP), pages 13--24, 2014.Google Scholar
- Dennis Abts, Natalie D. Enright Jerger, John Kim, Dan Gibson, and Mikko H. Lipasti. Achieving predictable performance through better memory controller placement in many-core CMPs. In International Symposium on Computer Architecture (ISCA), pages 451--461, 2009.Google ScholarDigital Library
- Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. A scalable processing-in-memory accelerator for parallel graph processing. In International Symposium on Computer Architecture, pages 105--117, 2015.Google ScholarDigital Library
- Matthew Schuchhardt, Abhishek Das, Nikos Hardavellas, Gokhan Memik, and Alok N. Choudhary. The impact of dynamic directories on multicore interconnects. IEEE Computer, 46(10):32--39, 2013.Google ScholarDigital Library
- Venkata Yaswanth Raparti and Sudeep Pasricha. RAPID: memory-aware NoC for latency optimized GPGPU architectures. IEEE Transactions on Multi-Scale Computing Systems, 4(4):874--887, 2018.Google ScholarCross Ref
- Manish Gupta, Vilas Sridharan, David Roberts, Andreas Prodromou, Ashish Venkat, Dean M. Tullsen, and Rajesh K. Gupta. Reliability-aware data placement for heterogeneous memory architecture. In IEEE International Symposium on High Performance Computer Architecture HPCA, pages 583--595, 2018.Google ScholarCross Ref
- Roopak Sinha, Barry Dowdeswell, Gulnara Zhabelova, and Valeriy Vyatkin. TORUS: scalable requirements traceability for large-scale cyber-physical systems. ACM Transactions on Cyber-Physical Systems, 3(2):15:1--15:25, 2019.Google ScholarDigital Library
- Jan Heisswolf, Simon Bischof, Michael Rückauer, and Jürgen Becker. Efficient memory access in 2D Mesh NoC architectures using high bandwidth routers. In Symposium on Integrated Circuits and Systems Design (SBCCI), pages 1--6, 2013.Google ScholarCross Ref
Index Terms
- A Physical-Aware Framework for Memory Network Design Space Exploration
Recommendations
UTPlaceF: A routability-driven FPGA placer with physical and congestion aware packing
2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)FPGA packing and placement without routability consideration could lead to unroutable results for high-utilization designs. Conventional FPGA packing and placement approaches are shown to have severe difficulties to yield good routability. In this paper, ...
CARAM: A Content-Aware Hybrid PCM/DRAM Main Memory System Framework
Network and Parallel ComputingAbstractThe emergence of Phase-Change Memory (PCM) provides opportunities for directly connecting persistent memory to main memory bus. While PCM achieves high read throughput and low standby power, the critical concerns are its poor write performance and ...
Comments