skip to main content
research-article

CALM: Contention-Aware Latency-Minimal Application Mapping for Flattened Butterfly On-Chip Networks

Published: 26 December 2016 Publication History

Abstract

With the emergence of many-core multiprocessor system-on-chips (MPSoCs), on-chip networks are facing serious challenges in providing fast communication among various tasks and cores. One promising on-chip network design approach shown in recent studies is to add express channels to traditional mesh network as shortcuts to bypass intermediate routers, thereby reducing packet latency. This approach not only changes the packet latency models, but also greatly affects network traffic behaviors, both of which have not been fully exploited in existing mapping algorithms. In this article, we explore the opportunities in optimizing application mapping for flattened butterfly, a popular express channel-based on-chip network. Specifically, we identify the unique characteristics of flattened butterfly, analyze the opportunities and new challenges, and propose an efficient heuristic mapping algorithm. The proposed algorithm Contention-Aware Latency Minimal (CALM) is able to reduce unnecessary turns that would otherwise impose additional router pipeline latency to packets, as well as adjust forwarding traffic to reduce network contention latency. Simulation results show that the proposed algorithm can achieve, on average, 3.4X reduction in the number of turns, 24.8% reduction in contention latency, and 14.12% reduction in the overall packet latency.

References

[1]
D. Abts, N. D. Jerger, J. Kim, D. Gibson, and M. H. Lipasti. 2009. Achieving predictable performance through better memory controller placement in many-core CMPs. In ACM SIGARCH Computer Architecture News 37, 3, 451--461.
[2]
D. Abts, M. R. Marty, P. M. Wells, P. Klausler, and H. Liu. 2010. Energy proportional datacenter networks. In ACM SIGARCH Computer Architecture News. 338--347.
[3]
N. Agarwal, T. Krishna, L. S. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 33--42.
[4]
J. Balfour and W. J. Dally. 2006. Design tradeoffs for tiled CMP on-chip networks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS). 187--198.
[5]
Y. Ben-Itzhak, I. Cidon, and A. Kolodny. 2011. Delay analysis of wormhole based heterogeneous NoC. In Proceedings of the IEEE/ACM International Symposium on Networks on Chip (NOCS).
[6]
P. Bogdan and R. Marculescu. 2011. Non-stationary traffic analysis and its implications on multicore platform design. IEEE Trans. Comput.-Aided Des. Integ. Circ. Syst. (TCAD). 508--519.
[7]
R. E. Burkard. 2013. Quadratic Assignment Problems. Springer, New York. 2741--2814.
[8]
C. Chen, F. Li, S. W. Son, and M. Kandemir. 2008. Application mapping for chip multiprocessors. In Proceedings of the 45th Annual Design Automation Conference (DAC). 620--625.
[9]
J. Choi, H. Oh, S. Kim, and S. Ha. 2012. Executing synchronous dataflow graphs on a SPM-based multicore architecture. In Proceedings of the 49th Annual Design Automation Conference (DAC). 664--671.
[10]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2001. Introduction to Algorithms. 2. MIT press Cambridge.
[11]
R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign. 97--101.
[12]
A. Faruque, M. Abdullah, R. Krist, and J. Henkel. 2008. ADAM: Run-time agent-based distributed application mapping for on-chip communication. In Proceedings of the 45th Annual Design Automation Conference (DAC). 760--765.
[13]
M. R. Garey and D. S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco, CA.
[14]
B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu. 2009. Express cube topologies for on-chip interconnects. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA). 163--174.
[15]
J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, and F. Pailet. 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). 108--109.
[16]
J. Hu and R. Marculescu. 2003. Energy-aware mapping for tile-based NoC architectures under performance constraints. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC). 233--239.
[17]
W. Jang and D. Z. Pan. 2012. A3MAP: Architecture-aware analytic mapping for networks-on-chip. ACM Trans. Des. Automat. Electron. Syst. (TODAES) 17, 3, 26:1--26:22.
[18]
B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49, 2, 291--307.
[19]
J. Kim, J. Balfour, and W. Dally. 2007. Flattened butterfly topology for on-chip networks. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 172--182.
[20]
H. W. Kuhn. 1955. The Hungarian method for the assignment problem. Nav. Res. Log. Quart. 2, 1--2 (1955), 83--97.
[21]
A. Kumar, L. Peh, P. Kundu, and N. K. Jha. 2007. Express virtual channels: Towards the ideal interconnection fabric. In Proceedings of International Symposium on Computer Architecture (ISCA). 150--161.
[22]
T. Lei and S. Kumar. 2003. A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In Proceedings of Euromicro Symposium in Digital Systems Design. 180--187.
[23]
S. Murali and De Micheli, G. 2004. Bandwidth-constrained mapping of cores onto NoC architectures. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE).
[24]
U. Y. Ogras and R. Marculescu. 2006. It s a small world after all: NoC performance optimization via long-range link insertion. In Proceedings of the IEEE Trans. Very Large Scale Integration (TVLSI) System -- Special Section Hardware/Software Codesign Syst. Synthesis 14, 7, 693--706.
[25]
U. Y. Ogras and R. Marculescu. 2007. Analytical router modeling for networks-on-chip performance analysis. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE).
[26]
P. Ou, J. Zhang, H. Quan, Y. Li, M. He, Z. Yu, X. Yu, S. Cui, J. Feng, S. Zhu, et al. 2013. A 65nm 39GOPS/W 24-core processor with 11Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). 56--57.
[27]
S. Park, T. Krishna, C. H. Chen, B. Daya, A. Chandrakasan, and L. S. Peh. 2012. Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI. In Proceedings of the 49th Annual Design Automation Conference (DAC). 398--405.
[28]
P. K. Sahu and S. Chattopadhyay. 2013. A survey on application mapping strategies for network-on-chip design. J. Syst. Architect. 59, 1, 60--76.
[29]
P. K. Sahu, T. Shah, K. Manna, and S. Chattopadhyay. 2014. Application mapping onto mesh-based network-on-chip using discrete particle swarm optimization. In Proceedings of the IEEE Trans. Very Large Scale Integration (TVLSI) Systems 22, 2, 300--312.
[30]
C. Sun, C. H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L. S. Peh, and V. Stojanovic. 2012. DSENT - A tool connecting emerging photonics with electronics for optoelectronic networks-on-chip modeling. In Proceedings of the 6th IEEE/ACM International Symposium in Networks-on-Chip (NOCS). 201--210.
[31]
D. Zhu, L. Chen, S. Yue, and M. Pedram. 2014. Application mapping for express channel-based networks-on-chip. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). 1--6.
[32]
D. Zhu, L. Chen, T. M. Pinkston, and M. Pedram, 2015. TAPP: Temperature-aware application mapping for NOC-based many-core processors. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). 1241--1244.

Cited By

View all
  • (2018)Centralized Priority Management Allocation for Network-on-Chip Router2018 31st IEEE International System-on-Chip Conference (SOCC)10.1109/SOCC.2018.8618484(290-295)Online publication date: Sep-2018

Index Terms

  1. CALM: Contention-Aware Latency-Minimal Application Mapping for Flattened Butterfly On-Chip Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Design Automation of Electronic Systems
    ACM Transactions on Design Automation of Electronic Systems  Volume 22, Issue 2
    Special Section of IDEA: Integrating Dataflow, Embedded Computing, and Architecture
    April 2017
    458 pages
    ISSN:1084-4309
    EISSN:1557-7309
    DOI:10.1145/3029795
    • Editor:
    • Naehyuck Chang
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 26 December 2016
    Accepted: 01 May 2016
    Revised: 01 May 2016
    Received: 01 October 2015
    Published in TODAES Volume 22, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Network on chip
    2. application mapping
    3. contention awareness

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NSF's Directorate for Computer 8 Information Science 8 Engineering
    • Software and Hardware Foundations

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Centralized Priority Management Allocation for Network-on-Chip Router2018 31st IEEE International System-on-Chip Conference (SOCC)10.1109/SOCC.2018.8618484(290-295)Online publication date: Sep-2018

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media