skip to main content
research-article

CALM: Contention-Aware Latency-Minimal Application Mapping for Flattened Butterfly On-Chip Networks

Authors Info & Claims
Published:26 December 2016Publication History
Skip Abstract Section

Abstract

With the emergence of many-core multiprocessor system-on-chips (MPSoCs), on-chip networks are facing serious challenges in providing fast communication among various tasks and cores. One promising on-chip network design approach shown in recent studies is to add express channels to traditional mesh network as shortcuts to bypass intermediate routers, thereby reducing packet latency. This approach not only changes the packet latency models, but also greatly affects network traffic behaviors, both of which have not been fully exploited in existing mapping algorithms. In this article, we explore the opportunities in optimizing application mapping for flattened butterfly, a popular express channel-based on-chip network. Specifically, we identify the unique characteristics of flattened butterfly, analyze the opportunities and new challenges, and propose an efficient heuristic mapping algorithm. The proposed algorithm Contention-Aware Latency Minimal (CALM) is able to reduce unnecessary turns that would otherwise impose additional router pipeline latency to packets, as well as adjust forwarding traffic to reduce network contention latency. Simulation results show that the proposed algorithm can achieve, on average, 3.4X reduction in the number of turns, 24.8% reduction in contention latency, and 14.12% reduction in the overall packet latency.

References

  1. D. Abts, N. D. Jerger, J. Kim, D. Gibson, and M. H. Lipasti. 2009. Achieving predictable performance through better memory controller placement in many-core CMPs. In ACM SIGARCH Computer Architecture News 37, 3, 451--461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Abts, M. R. Marty, P. M. Wells, P. Klausler, and H. Liu. 2010. Energy proportional datacenter networks. In ACM SIGARCH Computer Architecture News. 338--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Agarwal, T. Krishna, L. S. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 33--42. Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Balfour and W. J. Dally. 2006. Design tradeoffs for tiled CMP on-chip networks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS). 187--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Ben-Itzhak, I. Cidon, and A. Kolodny. 2011. Delay analysis of wormhole based heterogeneous NoC. In Proceedings of the IEEE/ACM International Symposium on Networks on Chip (NOCS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Bogdan and R. Marculescu. 2011. Non-stationary traffic analysis and its implications on multicore platform design. IEEE Trans. Comput.-Aided Des. Integ. Circ. Syst. (TCAD). 508--519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. E. Burkard. 2013. Quadratic Assignment Problems. Springer, New York. 2741--2814. Google ScholarGoogle ScholarCross RefCross Ref
  8. C. Chen, F. Li, S. W. Son, and M. Kandemir. 2008. Application mapping for chip multiprocessors. In Proceedings of the 45th Annual Design Automation Conference (DAC). 620--625. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Choi, H. Oh, S. Kim, and S. Ha. 2012. Executing synchronous dataflow graphs on a SPM-based multicore architecture. In Proceedings of the 49th Annual Design Automation Conference (DAC). 664--671. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2001. Introduction to Algorithms. 2. MIT press Cambridge.Google ScholarGoogle Scholar
  11. R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign. 97--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Faruque, M. Abdullah, R. Krist, and J. Henkel. 2008. ADAM: Run-time agent-based distributed application mapping for on-chip communication. In Proceedings of the 45th Annual Design Automation Conference (DAC). 760--765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. R. Garey and D. S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco, CA.Google ScholarGoogle Scholar
  14. B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu. 2009. Express cube topologies for on-chip interconnects. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA). 163--174. Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, and F. Pailet. 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). 108--109. Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Hu and R. Marculescu. 2003. Energy-aware mapping for tile-based NoC architectures under performance constraints. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC). 233--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Jang and D. Z. Pan. 2012. A3MAP: Architecture-aware analytic mapping for networks-on-chip. ACM Trans. Des. Automat. Electron. Syst. (TODAES) 17, 3, 26:1--26:22.Google ScholarGoogle Scholar
  18. B. W. Kernighan and S. Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49, 2, 291--307. Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Kim, J. Balfour, and W. Dally. 2007. Flattened butterfly topology for on-chip networks. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 172--182. Google ScholarGoogle ScholarCross RefCross Ref
  20. H. W. Kuhn. 1955. The Hungarian method for the assignment problem. Nav. Res. Log. Quart. 2, 1--2 (1955), 83--97.Google ScholarGoogle ScholarCross RefCross Ref
  21. A. Kumar, L. Peh, P. Kundu, and N. K. Jha. 2007. Express virtual channels: Towards the ideal interconnection fabric. In Proceedings of International Symposium on Computer Architecture (ISCA). 150--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Lei and S. Kumar. 2003. A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In Proceedings of Euromicro Symposium in Digital Systems Design. 180--187.Google ScholarGoogle Scholar
  23. S. Murali and De Micheli, G. 2004. Bandwidth-constrained mapping of cores onto NoC architectures. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). Google ScholarGoogle ScholarCross RefCross Ref
  24. U. Y. Ogras and R. Marculescu. 2006. It s a small world after all: NoC performance optimization via long-range link insertion. In Proceedings of the IEEE Trans. Very Large Scale Integration (TVLSI) System -- Special Section Hardware/Software Codesign Syst. Synthesis 14, 7, 693--706.Google ScholarGoogle Scholar
  25. U. Y. Ogras and R. Marculescu. 2007. Analytical router modeling for networks-on-chip performance analysis. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). Google ScholarGoogle ScholarCross RefCross Ref
  26. P. Ou, J. Zhang, H. Quan, Y. Li, M. He, Z. Yu, X. Yu, S. Cui, J. Feng, S. Zhu, et al. 2013. A 65nm 39GOPS/W 24-core processor with 11Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). 56--57.Google ScholarGoogle Scholar
  27. S. Park, T. Krishna, C. H. Chen, B. Daya, A. Chandrakasan, and L. S. Peh. 2012. Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI. In Proceedings of the 49th Annual Design Automation Conference (DAC). 398--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. K. Sahu and S. Chattopadhyay. 2013. A survey on application mapping strategies for network-on-chip design. J. Syst. Architect. 59, 1, 60--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. K. Sahu, T. Shah, K. Manna, and S. Chattopadhyay. 2014. Application mapping onto mesh-based network-on-chip using discrete particle swarm optimization. In Proceedings of the IEEE Trans. Very Large Scale Integration (TVLSI) Systems 22, 2, 300--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Sun, C. H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L. S. Peh, and V. Stojanovic. 2012. DSENT - A tool connecting emerging photonics with electronics for optoelectronic networks-on-chip modeling. In Proceedings of the 6th IEEE/ACM International Symposium in Networks-on-Chip (NOCS). 201--210.Google ScholarGoogle Scholar
  31. D. Zhu, L. Chen, S. Yue, and M. Pedram. 2014. Application mapping for express channel-based networks-on-chip. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). 1--6.Google ScholarGoogle Scholar
  32. D. Zhu, L. Chen, T. M. Pinkston, and M. Pedram, 2015. TAPP: Temperature-aware application mapping for NOC-based many-core processors. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE). 1241--1244.Google ScholarGoogle Scholar

Index Terms

  1. CALM: Contention-Aware Latency-Minimal Application Mapping for Flattened Butterfly On-Chip Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 22, Issue 2
      Special Section of IDEA: Integrating Dataflow, Embedded Computing, and Architecture
      April 2017
      458 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/3029795
      • Editor:
      • Naehyuck Chang
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 December 2016
      • Revised: 1 May 2016
      • Accepted: 1 May 2016
      • Received: 1 October 2015
      Published in todaes Volume 22, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader