ABSTRACT
Conventional AI accelerators are limited by von-Neumann bottlenecks for edge workloads. Domain-specific accelerators (often neuromorphic) solve this by applying near/in-memory computing, NoC-interconnected massive-multicore setups, and data-flow computation. This requires an effective mapping of neural networks (i.e, an assignment of network layers to cores) to balance resources/memory, computation, and NoC traffic. Here, we introduce a mapping called Snake for the predominant convolutional neural networks (CNNs). It utilizes the feed-forward nature of CNNs by folding layers to spatially adjacent cores. We achieve a total NoC bandwidth improvement of up to 3.8X for MobileNet and ResNet vs. random mappings. Furthermore, NEWROMAP is proposed that continues to optimize Snake mapping through a meta-heuristic; it also simulates the NoC traffic and can work with TensorFlow models. The communication is further optimized with up to 22.52% latency improvement vs. pure snake mapping shown in simulations.
- A Hansson et al. 2007. Avoiding message-dependent deadlock in network-based systems on chip. VLSI design (2007).Google Scholar
- C Marcon et al. 2005. Exploring NoC mapping strategies: an energy and timing aware technique. In DATE. IEEE.Google Scholar
- CE Graves et al. 2019. Memristor TCAMs Accelerate Regular Expression Matching for Network Intrusion Detection. IEEE TNANO 18 (2019).Google Scholar
- F Akopyan et al. 2015. Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE TCAD 34, 10 (2015).Google Scholar
- H Kwon et al. 2018. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53, 2 (2018).Google ScholarDigital Library
- JM Joseph et al. 2021. Ratatoskr: An open-source framework for in-depth power, performance and area analysis and optimization in 3D NoCs. ACM TOMACS 32, 1 (2021).Google ScholarDigital Library
- M Davies et al. 2018. Loihi: A neuromorphic manycore processor with on-chip learning. Micro (2018).Google Scholar
- MF Reza et al. 2019. Energy-efficient and high-performance NoC architecture and mapping solution for deep neural networks. In NOCS. IEEE.Google Scholar
- O Moreira et al. 2020. NeuronFlow: A Hybrid Neuromorphic - Dataflow Processor Architecture for AI Workloads. In AICAS. IEEE.Google Scholar
- S Gupta et al. 2019. NNPIM: A Processing In-Memory Architecture for Neural Network Acceleration. IEEE Trans. Comput. 68, 9 (2019).Google Scholar
- S Han et al. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:1510.00149 [cs.CV]Google Scholar
- SY Mirmahaleh et al. 2019. DNN pruning and mapping on NoC-Based communication infrastructure. Microelectronics Journal 94 (2019), 104655.Google ScholarDigital Library
- Y Chen, C Petti. 2016. ReRAM technology evolution for storage class memory application. In ESSDERC. IEEE.Google Scholar
- YH Chen et al. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. SIGARCH Comp. Archit. News 44, 3 (2016).Google ScholarDigital Library
Index Terms
- NEWROMAP: mapping CNNs to NoC-interconnected self-contained data-flow accelerators for edge-AI
Recommendations
Programming the Adapteva Epiphany 64-core network-on-chip coprocessor
Energy efficiency is the primary impediment in the path to exascale computing. Consequently, the high-performance computing community is increasingly interested in low-power high-performance embedded systems as building blocks for large-scale high-...
Bandwidth-efficient on-chip interconnect designs for GPGPUs
DAC '15: Proceedings of the 52nd Annual Design Automation ConferenceModern computational workloads require abundant thread level parallelism (TLP), necessitating highly-parallel, many-core accelerators such as General Purpose Graphics Processing Units (GPGPUs). GPGPUs place a heavy demand on the on-chip interconnect ...
iConn: A Communication Infrastructure for Heterogeneous Computing Architectures
Special Issues on Neuromorphic Computing and Emerging Many-Core Systems for Exascale ComputingRecently, the graphics processing unit (GPU) has made significant progress as a general-purpose parallel processor. The CPU and GPU cooperate together to solve data-parallel and control-intensive real-world applications in an optimized fashion. For ...
Comments