research-article

CALM: Contention-Aware Latency-Minimal Application Mapping for Flattened Butterfly On-Chip Networks

Authors:

Massoud Pedram,

Lizhong ChenAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 22, Issue 2

Article No.: 21, Pages 1 - 21

https://doi.org/10.1145/2950045

Published: 26 December 2016 Publication History

Abstract

With the emergence of many-core multiprocessor system-on-chips (MPSoCs), on-chip networks are facing serious challenges in providing fast communication among various tasks and cores. One promising on-chip network design approach shown in recent studies is to add express channels to traditional mesh network as shortcuts to bypass intermediate routers, thereby reducing packet latency. This approach not only changes the packet latency models, but also greatly affects network traffic behaviors, both of which have not been fully exploited in existing mapping algorithms. In this article, we explore the opportunities in optimizing application mapping for flattened butterfly, a popular express channel-based on-chip network. Specifically, we identify the unique characteristics of flattened butterfly, analyze the opportunities and new challenges, and propose an efficient heuristic mapping algorithm. The proposed algorithm Contention-Aware Latency Minimal (CALM) is able to reduce unnecessary turns that would otherwise impose additional router pipeline latency to packets, as well as adjust forwarding traffic to reduce network contention latency. Simulation results show that the proposed algorithm can achieve, on average, 3.4X reduction in the number of turns, 24.8% reduction in contention latency, and 14.12% reduction in the overall packet latency.

References

[1]

D. Abts, N. D. Jerger, J. Kim, D. Gibson, and M. H. Lipasti. 2009. Achieving predictable performance through better memory controller placement in many-core CMPs. In ACM SIGARCH Computer Architecture News 37, 3, 451--461.

Digital Library

[2]

D. Abts, M. R. Marty, P. M. Wells, P. Klausler, and H. Liu. 2010. Energy proportional datacenter networks. In ACM SIGARCH Computer Architecture News. 338--347.

Digital Library

[3]

N. Agarwal, T. Krishna, L. S. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 33--42.

[4]

J. Balfour and W. J. Dally. 2006. Design tradeoffs for tiled CMP on-chip networks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS). 187--198.

Digital Library

[5]

Y. Ben-Itzhak, I. Cidon, and A. Kolodny. 2011. Delay analysis of wormhole based heterogeneous NoC. In Proceedings of the IEEE/ACM International Symposium on Networks on Chip (NOCS).

Digital Library

[6]

P. Bogdan and R. Marculescu. 2011. Non-stationary traffic analysis and its implications on multicore platform design. IEEE Trans. Comput.-Aided Des. Integ. Circ. Syst. (TCAD). 508--519.

Digital Library

[7]

R. E. Burkard. 2013. Quadratic Assignment Problems. Springer, New York. 2741--2814.

[8]

C. Chen, F. Li, S. W. Son, and M. Kandemir. 2008. Application mapping for chip multiprocessors. In Proceedings of the 45th Annual Design Automation Conference (DAC). 620--625.

Digital Library

[9]

J. Choi, H. Oh, S. Kim, and S. Ha. 2012. Executing synchronous dataflow graphs on a SPM-based multicore architecture. In Proceedings of the 49th Annual Design Automation Conference (DAC). 664--671.

Digital Library

[10]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2001. Introduction to Algorithms. 2. MIT press Cambridge.

[11]

R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign. 97--101.

Digital Library

[12]

A. Faruque, M. Abdullah, R. Krist, and J. Henkel. 2008. ADAM: Run-time agent-based distributed application mapping for on-chip communication. In Proceedings of the 45th Annual Design Automation Conference (DAC). 760--765.

Digital Library

[13]

M. R. Garey and D. S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco, CA.

[14]

B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu. 2009. Express cube topologies for on-chip interconnects. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA). 163--174.

[15]

J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, and F. Pailet. 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). 108--109.

[16]

J. Hu and R. Marculescu. 2003. Energy-aware mapping for tile-based NoC architectures under performance constraints. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC). 233--239.

Digital Library

[17]

W. Jang and D. Z. Pan. 2012. A3MAP: Architecture-aware analytic mapping for networks-on-chip. ACM Trans. Des. Automat. Electron. Syst. (TODAES) 17, 3, 26:1--26:22.

[18]

B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49, 2, 291--307.

[19]

J. Kim, J. Balfour, and W. Dally. 2007. Flattened butterfly topology for on-chip networks. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 172--182.

[20]

H. W. Kuhn. 1955. The Hungarian method for the assignment problem. Nav. Res. Log. Quart. 2, 1--2 (1955), 83--97.

[21]

A. Kumar, L. Peh, P. Kundu, and N. K. Jha. 2007. Express virtual channels: Towards the ideal interconnection fabric. In Proceedings of International Symposium on Computer Architecture (ISCA). 150--161.

Digital Library

[22]

T. Lei and S. Kumar. 2003. A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In Proceedings of Euromicro Symposium in Digital Systems Design. 180--187.

[23]

S. Murali and De Micheli, G. 2004. Bandwidth-constrained mapping of cores onto NoC architectures. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE).

[24]

U. Y. Ogras and R. Marculescu. 2006. It s a small world after all: NoC performance optimization via long-range link insertion. In Proceedings of the IEEE Trans. Very Large Scale Integration (TVLSI) System -- Special Section Hardware/Software Codesign Syst. Synthesis 14, 7, 693--706.

[25]

U. Y. Ogras and R. Marculescu. 2007. Analytical router modeling for networks-on-chip performance analysis. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE).

[26]

P. Ou, J. Zhang, H. Quan, Y. Li, M. He, Z. Yu, X. Yu, S. Cui, J. Feng, S. Zhu, et al. 2013. A 65nm 39GOPS/W 24-core processor with 11Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). 56--57.

[27]

S. Park, T. Krishna, C. H. Chen, B. Daya, A. Chandrakasan, and L. S. Peh. 2012. Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI. In Proceedings of the 49th Annual Design Automation Conference (DAC). 398--405.

Digital Library

[28]

P. K. Sahu and S. Chattopadhyay. 2013. A survey on application mapping strategies for network-on-chip design. J. Syst. Architect. 59, 1, 60--76.

Digital Library

[29]

P. K. Sahu, T. Shah, K. Manna, and S. Chattopadhyay. 2014. Application mapping onto mesh-based network-on-chip using discrete particle swarm optimization. In Proceedings of the IEEE Trans. Very Large Scale Integration (TVLSI) Systems 22, 2, 300--312.

Digital Library

[30]

C. Sun, C. H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L. S. Peh, and V. Stojanovic. 2012. DSENT - A tool connecting emerging photonics with electronics for optoelectronic networks-on-chip modeling. In Proceedings of the 6th IEEE/ACM International Symposium in Networks-on-Chip (NOCS). 201--210.

[31]

D. Zhu, L. Chen, S. Yue, and M. Pedram. 2014. Application mapping for express channel-based networks-on-chip. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). 1--6.

[32]

D. Zhu, L. Chen, T. M. Pinkston, and M. Pedram, 2015. TAPP: Temperature-aware application mapping for NOC-based many-core processors. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). 1241--1244.

Cited By

Yan PSridhar R(2018)Centralized Priority Management Allocation for Network-on-Chip Router2018 31st IEEE International System-on-Chip Conference (SOCC)10.1109/SOCC.2018.8618484(290-295)Online publication date: Sep-2018
https://doi.org/10.1109/SOCC.2018.8618484

Index Terms

CALM: Contention-Aware Latency-Minimal Application Mapping for Flattened Butterfly On-Chip Networks
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures

Recommendations

X-Network: An area-efficient and high-performance on-chip wormhole interconnect network

Packet-switching networks on chip (NoCs) have emerged as a promising paradigm for designing scalable communication infrastructures for future chip many-core processors and complex Systems on Chip (SoCs). However, the quest for high-performance networks ...
Hierarchical circuit-switched NoC for multicore video processing

Today's prevailing video systems demand extreme performance that can be efficiently supported by parallel computing engines. This paper presents a novel hierarchical circuit-switched ring network on chip (called HrNoC) for the parallel engines, of which ...
A Low-power Low-cost Optical Router for Optical Networks-on-Chip in Multiprocessor Systems-on-Chip
ISVLSI '09: Proceedings of the 2009 IEEE Computer Society Annual Symposium on VLSI

Networks-on-chip (NoCs) can improve the communication bandwidth and power efficiency of multiprocessor systems-on-chip (MPSoC). However, traditional metallic interconnects consume significant amount of power to deliver even higher communication ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 22, Issue 2

Special Section of IDEA: Integrating Dataflow, Embedded Computing, and Architecture

April 2017

458 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/3029795

Editor:
Naehyuck Chang
Korea Advanced Institute of Science and Technology, Korea

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 26 December 2016

Accepted: 01 May 2016

Revised: 01 May 2016

Received: 01 October 2015

Published in TODAES Volume 22, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSF's Directorate for Computer 8 Information Science 8 Engineering
Software and Hardware Foundations

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
153
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yan PSridhar R(2018)Centralized Priority Management Allocation for Network-on-Chip Router2018 31st IEEE International System-on-Chip Conference (SOCC)10.1109/SOCC.2018.8618484(290-295)Online publication date: Sep-2018
https://doi.org/10.1109/SOCC.2018.8618484

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents