skip to main content
research-article

Ordering circuit establishment in multiplane NoCs

Published: 25 October 2013 Publication History

Abstract

Segregating networks-on-chips (NoCs) into data and control planes yields several opportunities for improving power and performance in chip-multiprocessor systems (CMPs). This article describes a hybrid packet/circuit switched multiplane network optimized to reduce latency in order to improve system performance and/or reduce system energy. Unlike traditional circuit preallocation techniques which require timestamps to reserve circuit resources, this article proposes an order-based preallocation scheme. By enforcing the order in which resources are scheduled and utilized rather than a fixed time, the NoC can take advantage of messages that arrive early while naturally tolerating message delays due to contention. Ordered circuit establishment is presented using two techniques. First, Déjà Vu switching preestablishes circuits for data messages once a cache hit is detected and prior to the requested data becoming available. Second, using Red Carpet Routing, circuits are proactively reserved for a return data message as a request message traverses the NoC. The reduced communication latency over configured circuits enable system performance improvement or saving NoC energy by reducing voltage and frequency without sacrificing performance. In simulations of 16 and 64 core CMPs, Déjà Vu switching enabled average NoC energy savings of 43% and 53% respectively. On the other hand, simulations of communication sensitive benchmarks using Red Carpet Routing show speedup in execution time of up to 16%, with an average of 10% over a purely packet switched NoC and an average of 8% over preconfiguring circuits using Déjà Vu switching.

References

[1]
Abousamra, A., Jones, A. K., and Melhem, R. 2012. Codesign of NoC and cache organization for reducing access latency in chip multiprocessors. IEEE Trans. Parallel Distrib. Syst. 23, 6, 103--1046.
[2]
Abousamra, A., Melhem, R., and Jones, A. 2009. Winning with pinning in NoC. In Proceedings of the 17th IEEE Symposium on High Performance Interconnects (HOTI'09). 13--21.
[3]
Ahn, M. and Kim, E. J. 2010. Pseudo-circuit: Accelerating communication for on-chip interconnection networks. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 399--408.
[4]
Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques.
[5]
Cheng, L., Muralimanohar, N., Ramani, K., Balasubramonian, R., and Carter, J. B. 2006. Interconnect-aware coherence protocols for chip multiprocessors. In Proceedings of the Annual International Symposium on Computer Architecture. 339--351.
[6]
Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., and Hughes, B. 2010. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30, 2, 16--29.
[7]
Duato, J., López, P., Silla, F., and Yalamanchili, S. 1996. A high performance router architecture for interconnection networks. In Proceedings of the International Conference on Parallel Processing, Vol. 1. 61--68.
[8]
Flores, A., Aragon, J. L., and Acacio, M. E. 2010. Heterogeneous interconnects for energy-efficient message management in CMPs. IEEE Trans. Comput. 59, 1, 16--28.
[9]
Goossens, K. and Hansson, A. 2010. The aethereal network on chip after ten years: Goals, evolution, lessons, and future. In Proceedings of the 47th ACM/IEEE Design Automation Conference (DAC'10). 306--311.
[10]
Hansson, A., Subburaman, M., and Goossens, K. 2009. Aelite: A flit-synchronous Network on Chip with composable and predictable services. In Proceedings of the Design, Automation Test in Europe Conference and Exhibition (DATE'09). 250--255.
[11]
Herbert, S. and Marculescu, D. 2007. Analysis of dynamic voltage/frequency scaling in chip multiprocessors. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'07). ACM, New York, 38--43.
[12]
Hoskote, Y., Vangal, S., Singh, A., Borkar, N., and Borkar, S. 2007. A 5-GHz mesh interconnect for a teraflops processor. IEEE Micro 27, 5, 51--61.
[13]
Howard, J. and Dighe, S. 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the IEEE International Solid- State Circuits Conference (ISSCC'10). 108--109.
[14]
Jerger, N. D. E., Peh, L.-S., and Lipasti, M. H. 2008. Circuit-switched coherence. Comput. Archit. Lett. 6, 1, 193--202.
[15]
Kahng, A. B., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'09). 423--428.
[16]
Kumar, A., Peh, L.-S., and Jha, N. K. 2008. Token flow control. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 342--353.
[17]
Kumar, A., Peh, L.-S., Kundu, P., and Jha, N. K. 2007. Express virtual channels: towards the ideal interconnection fabric. In Proceedings of the Annual International Symposium on Computer Architecture. 150--161.
[18]
Lee, S. E., and Bagherzadeh, N. 2009. A variable frequency link for a power-aware network-on-chip (NoC). Integration 42, 4, 479--485.
[19]
Li, Y., Abousamra, A., Melhem, R., and Jones, A. K. 2012. Compiler-assisted data distribution and network configuration for chip multiprocessors. IEEE Trans. Parallel Distrib. Syst. 23, 11, 2058--2066. 2011.279.
[20]
Li, Z., Zhu, C., Shang, L., Dick, R. P., and Sun, Y. 2008. Transaction-aware network-on-chip resource reservation. Comput. Archit. Lett. 7, 2, 53--56.
[21]
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58.
[22]
Mullins, R. D., West, A., and Moore, S. W. 2004. Low-latency virtual-channel routers for on-chip networks. In Proceedings of the Annual International Symposium on Computer Architecture. 188--197.
[23]
Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 3--14.
[24]
Owens, J. D., Dally, W. J., Ho, R., Jayasimha, D. N., Keckler, S. W., and Peh, L.-S. 2007. Research challenges for on-chip interconnection networks. IEEE Micro 27, 5, 96--108.
[25]
Park, D., Das, R., Nicopoulos, C., Kim, J., Vijaykrishnan, N., Iyer, R., and Das, C. R. 2007. Design of a dynamic priority-based fast path architecture for on-chip interconnects. In Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI'07). IEEE, 15--20.
[26]
Peh, L.-S. and Dally, W. J. 2000. Flit-reservation flow control. In Proceedings of the International Symposium on High-Performance Computer Architecture. 73--84.
[27]
Peh, L.-S. and Dally, W. J. 2001. A delay model and speculative architecture for pipelined routers. In Proceedings of the International Symposium on High-Performance Computer Architecture. 255--266.
[28]
Sawant, S., Desai, U., Shamanna, G., Sharma, L., Ranade, M., Agarwal, A., Dakshinamurthy, S., and Narayanan, R. 2011. A 32nm Westmere-EX Xeon R enterprise processor. In Proceedings of the IEEE International Solid-State Circuits Conference. 74--75.
[29]
Schoeberl, M., Brandner, F., Sparso, J., and Kasapaki, E. 2012. A statically scheduled time-division-multiplexed network-on-chip for real-time systems. In Proceedings of the International Symposium on Networks-on-Chip. 152--160.
[30]
Shang, L., Peh, L.-S., and Jha, N. K. 2003. Dynamic voltage scaling with links for power optimization of interconnection networks. In Proceedings of the International Symposium on High-Performance Computer Architecture. 91--102.
[31]
Soteriou, V., and Peh, L.-S. 2007. Exploring the design space of self-regulating power-aware on/off interconnection networks. IEEE Trans. Parallel Distrib. Syst. 18, 3, 393--408.
[32]
SPEC. 2005. SPEC Benchmarks. (2005).
[33]
Volpe, J. 2011. Tilera's new 100-core CPU elbows its way to the cloud, face-melt still included. Engaget.
[34]
Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the Annual International Symposium on Computer Architecture. 24--36.

Cited By

View all
  • (2017)BrNoC: A broadcast NoC for control messages in many-core systemsMicroelectronics Journal10.1016/j.mejo.2017.08.01068(69-77)Online publication date: Oct-2017

Index Terms

  1. Ordering circuit establishment in multiplane NoCs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 18, Issue 4
      Special Section on Networks on Chip: Architecture, Tools, and Methodologies
      October 2013
      380 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/2541012
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 25 October 2013
      Accepted: 01 July 2013
      Revised: 01 June 2013
      Received: 01 January 2013
      Published in TODAES Volume 18, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 18 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)BrNoC: A broadcast NoC for control messages in many-core systemsMicroelectronics Journal10.1016/j.mejo.2017.08.01068(69-77)Online publication date: Oct-2017

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media