Abstract
With the emergence of many-core multiprocessor system-on-chips (MPSoCs), on-chip networks are facing serious challenges in providing fast communication among various tasks and cores. One promising on-chip network design approach shown in recent studies is to add express channels to traditional mesh network as shortcuts to bypass intermediate routers, thereby reducing packet latency. This approach not only changes the packet latency models, but also greatly affects network traffic behaviors, both of which have not been fully exploited in existing mapping algorithms. In this article, we explore the opportunities in optimizing application mapping for flattened butterfly, a popular express channel-based on-chip network. Specifically, we identify the unique characteristics of flattened butterfly, analyze the opportunities and new challenges, and propose an efficient heuristic mapping algorithm. The proposed algorithm Contention-Aware Latency Minimal (CALM) is able to reduce unnecessary turns that would otherwise impose additional router pipeline latency to packets, as well as adjust forwarding traffic to reduce network contention latency. Simulation results show that the proposed algorithm can achieve, on average, 3.4X reduction in the number of turns, 24.8% reduction in contention latency, and 14.12% reduction in the overall packet latency.
- D. Abts, N. D. Jerger, J. Kim, D. Gibson, and M. H. Lipasti. 2009. Achieving predictable performance through better memory controller placement in many-core CMPs. In ACM SIGARCH Computer Architecture News 37, 3, 451--461. Google ScholarDigital Library
- D. Abts, M. R. Marty, P. M. Wells, P. Klausler, and H. Liu. 2010. Energy proportional datacenter networks. In ACM SIGARCH Computer Architecture News. 338--347. Google ScholarDigital Library
- N. Agarwal, T. Krishna, L. S. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 33--42. Google ScholarCross Ref
- J. Balfour and W. J. Dally. 2006. Design tradeoffs for tiled CMP on-chip networks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS). 187--198. Google ScholarDigital Library
- Y. Ben-Itzhak, I. Cidon, and A. Kolodny. 2011. Delay analysis of wormhole based heterogeneous NoC. In Proceedings of the IEEE/ACM International Symposium on Networks on Chip (NOCS). Google ScholarDigital Library
- P. Bogdan and R. Marculescu. 2011. Non-stationary traffic analysis and its implications on multicore platform design. IEEE Trans. Comput.-Aided Des. Integ. Circ. Syst. (TCAD). 508--519.Google ScholarDigital Library
- R. E. Burkard. 2013. Quadratic Assignment Problems. Springer, New York. 2741--2814. Google ScholarCross Ref
- C. Chen, F. Li, S. W. Son, and M. Kandemir. 2008. Application mapping for chip multiprocessors. In Proceedings of the 45th Annual Design Automation Conference (DAC). 620--625. Google ScholarDigital Library
- J. Choi, H. Oh, S. Kim, and S. Ha. 2012. Executing synchronous dataflow graphs on a SPM-based multicore architecture. In Proceedings of the 49th Annual Design Automation Conference (DAC). 664--671. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2001. Introduction to Algorithms. 2. MIT press Cambridge.Google Scholar
- R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign. 97--101. Google ScholarDigital Library
- A. Faruque, M. Abdullah, R. Krist, and J. Henkel. 2008. ADAM: Run-time agent-based distributed application mapping for on-chip communication. In Proceedings of the 45th Annual Design Automation Conference (DAC). 760--765. Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco, CA.Google Scholar
- B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu. 2009. Express cube topologies for on-chip interconnects. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA). 163--174. Google ScholarCross Ref
- J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, and F. Pailet. 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). 108--109. Google ScholarCross Ref
- J. Hu and R. Marculescu. 2003. Energy-aware mapping for tile-based NoC architectures under performance constraints. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC). 233--239. Google ScholarDigital Library
- W. Jang and D. Z. Pan. 2012. A3MAP: Architecture-aware analytic mapping for networks-on-chip. ACM Trans. Des. Automat. Electron. Syst. (TODAES) 17, 3, 26:1--26:22.Google Scholar
- B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49, 2, 291--307. Google ScholarCross Ref
- J. Kim, J. Balfour, and W. Dally. 2007. Flattened butterfly topology for on-chip networks. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 172--182. Google ScholarCross Ref
- H. W. Kuhn. 1955. The Hungarian method for the assignment problem. Nav. Res. Log. Quart. 2, 1--2 (1955), 83--97.Google ScholarCross Ref
- A. Kumar, L. Peh, P. Kundu, and N. K. Jha. 2007. Express virtual channels: Towards the ideal interconnection fabric. In Proceedings of International Symposium on Computer Architecture (ISCA). 150--161. Google ScholarDigital Library
- T. Lei and S. Kumar. 2003. A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In Proceedings of Euromicro Symposium in Digital Systems Design. 180--187.Google Scholar
- S. Murali and De Micheli, G. 2004. Bandwidth-constrained mapping of cores onto NoC architectures. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). Google ScholarCross Ref
- U. Y. Ogras and R. Marculescu. 2006. It s a small world after all: NoC performance optimization via long-range link insertion. In Proceedings of the IEEE Trans. Very Large Scale Integration (TVLSI) System -- Special Section Hardware/Software Codesign Syst. Synthesis 14, 7, 693--706.Google Scholar
- U. Y. Ogras and R. Marculescu. 2007. Analytical router modeling for networks-on-chip performance analysis. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). Google ScholarCross Ref
- P. Ou, J. Zhang, H. Quan, Y. Li, M. He, Z. Yu, X. Yu, S. Cui, J. Feng, S. Zhu, et al. 2013. A 65nm 39GOPS/W 24-core processor with 11Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). 56--57.Google Scholar
- S. Park, T. Krishna, C. H. Chen, B. Daya, A. Chandrakasan, and L. S. Peh. 2012. Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI. In Proceedings of the 49th Annual Design Automation Conference (DAC). 398--405. Google ScholarDigital Library
- P. K. Sahu and S. Chattopadhyay. 2013. A survey on application mapping strategies for network-on-chip design. J. Syst. Architect. 59, 1, 60--76. Google ScholarDigital Library
- P. K. Sahu, T. Shah, K. Manna, and S. Chattopadhyay. 2014. Application mapping onto mesh-based network-on-chip using discrete particle swarm optimization. In Proceedings of the IEEE Trans. Very Large Scale Integration (TVLSI) Systems 22, 2, 300--312. Google ScholarDigital Library
- C. Sun, C. H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L. S. Peh, and V. Stojanovic. 2012. DSENT - A tool connecting emerging photonics with electronics for optoelectronic networks-on-chip modeling. In Proceedings of the 6th IEEE/ACM International Symposium in Networks-on-Chip (NOCS). 201--210.Google Scholar
- D. Zhu, L. Chen, S. Yue, and M. Pedram. 2014. Application mapping for express channel-based networks-on-chip. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). 1--6.Google Scholar
- D. Zhu, L. Chen, T. M. Pinkston, and M. Pedram, 2015. TAPP: Temperature-aware application mapping for NOC-based many-core processors. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). 1241--1244.Google Scholar
Index Terms
- CALM: Contention-Aware Latency-Minimal Application Mapping for Flattened Butterfly On-Chip Networks
Recommendations
X-Network: An area-efficient and high-performance on-chip wormhole interconnect network
Packet-switching networks on chip (NoCs) have emerged as a promising paradigm for designing scalable communication infrastructures for future chip many-core processors and complex Systems on Chip (SoCs). However, the quest for high-performance networks ...
Hierarchical circuit-switched NoC for multicore video processing
Today's prevailing video systems demand extreme performance that can be efficiently supported by parallel computing engines. This paper presents a novel hierarchical circuit-switched ring network on chip (called HrNoC) for the parallel engines, of which ...
A Low-power Low-cost Optical Router for Optical Networks-on-Chip in Multiprocessor Systems-on-Chip
ISVLSI '09: Proceedings of the 2009 IEEE Computer Society Annual Symposium on VLSINetworks-on-chip (NoCs) can improve the communication bandwidth and power efficiency of multiprocessor systems-on-chip (MPSoC). However, traditional metallic interconnects consume significant amount of power to deliver even higher communication ...
Comments