research-article

Ordering circuit establishment in multiplane NoCs

Authors:

Ahmed Abousamra,

Rami MelhemAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 18, Issue 4

Article No.: 49, Pages 1 - 33

https://doi.org/10.1145/2500752

Published: 25 October 2013 Publication History

Abstract

Segregating networks-on-chips (NoCs) into data and control planes yields several opportunities for improving power and performance in chip-multiprocessor systems (CMPs). This article describes a hybrid packet/circuit switched multiplane network optimized to reduce latency in order to improve system performance and/or reduce system energy. Unlike traditional circuit preallocation techniques which require timestamps to reserve circuit resources, this article proposes an order-based preallocation scheme. By enforcing the order in which resources are scheduled and utilized rather than a fixed time, the NoC can take advantage of messages that arrive early while naturally tolerating message delays due to contention. Ordered circuit establishment is presented using two techniques. First, Déjà Vu switching preestablishes circuits for data messages once a cache hit is detected and prior to the requested data becoming available. Second, using Red Carpet Routing, circuits are proactively reserved for a return data message as a request message traverses the NoC. The reduced communication latency over configured circuits enable system performance improvement or saving NoC energy by reducing voltage and frequency without sacrificing performance. In simulations of 16 and 64 core CMPs, Déjà Vu switching enabled average NoC energy savings of 43% and 53% respectively. On the other hand, simulations of communication sensitive benchmarks using Red Carpet Routing show speedup in execution time of up to 16%, with an average of 10% over a purely packet switched NoC and an average of 8% over preconfiguring circuits using Déjà Vu switching.

References

[1]

Abousamra, A., Jones, A. K., and Melhem, R. 2012. Codesign of NoC and cache organization for reducing access latency in chip multiprocessors. IEEE Trans. Parallel Distrib. Syst. 23, 6, 103--1046.

Digital Library

[2]

Abousamra, A., Melhem, R., and Jones, A. 2009. Winning with pinning in NoC. In Proceedings of the 17th IEEE Symposium on High Performance Interconnects (HOTI'09). 13--21.

Digital Library

[3]

Ahn, M. and Kim, E. J. 2010. Pseudo-circuit: Accelerating communication for on-chip interconnection networks. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 399--408.

Digital Library

[4]

Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques.

Digital Library

[5]

Cheng, L., Muralimanohar, N., Ramani, K., Balasubramonian, R., and Carter, J. B. 2006. Interconnect-aware coherence protocols for chip multiprocessors. In Proceedings of the Annual International Symposium on Computer Architecture. 339--351.

Digital Library

[6]

Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., and Hughes, B. 2010. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30, 2, 16--29.

Digital Library

[7]

Duato, J., López, P., Silla, F., and Yalamanchili, S. 1996. A high performance router architecture for interconnection networks. In Proceedings of the International Conference on Parallel Processing, Vol. 1. 61--68.

[8]

Flores, A., Aragon, J. L., and Acacio, M. E. 2010. Heterogeneous interconnects for energy-efficient message management in CMPs. IEEE Trans. Comput. 59, 1, 16--28.

Digital Library

[9]

Goossens, K. and Hansson, A. 2010. The aethereal network on chip after ten years: Goals, evolution, lessons, and future. In Proceedings of the 47th ACM/IEEE Design Automation Conference (DAC'10). 306--311.

Digital Library

[10]

Hansson, A., Subburaman, M., and Goossens, K. 2009. Aelite: A flit-synchronous Network on Chip with composable and predictable services. In Proceedings of the Design, Automation Test in Europe Conference and Exhibition (DATE'09). 250--255.

Digital Library

[11]

Herbert, S. and Marculescu, D. 2007. Analysis of dynamic voltage/frequency scaling in chip multiprocessors. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'07). ACM, New York, 38--43.

Digital Library

[12]

Hoskote, Y., Vangal, S., Singh, A., Borkar, N., and Borkar, S. 2007. A 5-GHz mesh interconnect for a teraflops processor. IEEE Micro 27, 5, 51--61.

Digital Library

[13]

Howard, J. and Dighe, S. 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the IEEE International Solid- State Circuits Conference (ISSCC'10). 108--109.

[14]

Jerger, N. D. E., Peh, L.-S., and Lipasti, M. H. 2008. Circuit-switched coherence. Comput. Archit. Lett. 6, 1, 193--202.

[15]

Kahng, A. B., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'09). 423--428.

Digital Library

[16]

Kumar, A., Peh, L.-S., and Jha, N. K. 2008. Token flow control. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 342--353.

Digital Library

[17]

Kumar, A., Peh, L.-S., Kundu, P., and Jha, N. K. 2007. Express virtual channels: towards the ideal interconnection fabric. In Proceedings of the Annual International Symposium on Computer Architecture. 150--161.

Digital Library

[18]

Lee, S. E., and Bagherzadeh, N. 2009. A variable frequency link for a power-aware network-on-chip (NoC). Integration 42, 4, 479--485.

Digital Library

[19]

Li, Y., Abousamra, A., Melhem, R., and Jones, A. K. 2012. Compiler-assisted data distribution and network configuration for chip multiprocessors. IEEE Trans. Parallel Distrib. Syst. 23, 11, 2058--2066. 2011.279.

Digital Library

[20]

Li, Z., Zhu, C., Shang, L., Dick, R. P., and Sun, Y. 2008. Transaction-aware network-on-chip resource reservation. Comput. Archit. Lett. 7, 2, 53--56.

Digital Library

[21]

Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58.

Digital Library

[22]

Mullins, R. D., West, A., and Moore, S. W. 2004. Low-latency virtual-channel routers for on-chip networks. In Proceedings of the Annual International Symposium on Computer Architecture. 188--197.

Digital Library

[23]

Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 3--14.

Digital Library

[24]

Owens, J. D., Dally, W. J., Ho, R., Jayasimha, D. N., Keckler, S. W., and Peh, L.-S. 2007. Research challenges for on-chip interconnection networks. IEEE Micro 27, 5, 96--108.

Digital Library

[25]

Park, D., Das, R., Nicopoulos, C., Kim, J., Vijaykrishnan, N., Iyer, R., and Das, C. R. 2007. Design of a dynamic priority-based fast path architecture for on-chip interconnects. In Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI'07). IEEE, 15--20.

Digital Library

[26]

Peh, L.-S. and Dally, W. J. 2000. Flit-reservation flow control. In Proceedings of the International Symposium on High-Performance Computer Architecture. 73--84.

[27]

Peh, L.-S. and Dally, W. J. 2001. A delay model and speculative architecture for pipelined routers. In Proceedings of the International Symposium on High-Performance Computer Architecture. 255--266.

Digital Library

[28]

Sawant, S., Desai, U., Shamanna, G., Sharma, L., Ranade, M., Agarwal, A., Dakshinamurthy, S., and Narayanan, R. 2011. A 32nm Westmere-EX Xeon R enterprise processor. In Proceedings of the IEEE International Solid-State Circuits Conference. 74--75.

[29]

Schoeberl, M., Brandner, F., Sparso, J., and Kasapaki, E. 2012. A statically scheduled time-division-multiplexed network-on-chip for real-time systems. In Proceedings of the International Symposium on Networks-on-Chip. 152--160.

Digital Library

[30]

Shang, L., Peh, L.-S., and Jha, N. K. 2003. Dynamic voltage scaling with links for power optimization of interconnection networks. In Proceedings of the International Symposium on High-Performance Computer Architecture. 91--102.

Digital Library

[31]

Soteriou, V., and Peh, L.-S. 2007. Exploring the design space of self-regulating power-aware on/off interconnection networks. IEEE Trans. Parallel Distrib. Syst. 18, 3, 393--408.

Digital Library

[32]

SPEC. 2005. SPEC Benchmarks. (2005).

[33]

Volpe, J. 2011. Tilera's new 100-core CPU elbows its way to the cloud, face-melt still included. Engaget.

[34]

Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the Annual International Symposium on Computer Architecture. 24--36.

Digital Library

Cited By

Wachter ECaimi LFochi VMunhoz DMoraes F(2017)BrNoC: A broadcast NoC for control messages in many-core systemsMicroelectronics Journal10.1016/j.mejo.2017.08.01068(69-77)Online publication date: Oct-2017
https://doi.org/10.1016/j.mejo.2017.08.010

Index Terms

Ordering circuit establishment in multiplane NoCs
1. Networks
  1. Network types
    1. Packet-switching networks
    2. Wired access networks

Recommendations

Déjà Vu Switching for Multiplane NoCs
NOCS '12: Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

In chip-multiprocessors (CMPs) the network-on-chip (NoC) carries cache coherence and data messages. These messages may be classified into critical and non-critical messages. Hence, instead of having one interconnect plane to serve all traffic, power can ...
Proactive circuit allocation in multiplane NoCs
DAC '13: Proceedings of the 50th Annual Design Automation Conference

This work explores a method for efficient pre-allocation of circuits in network-on-chip (NoC) to reduce communication latency and improve performance. Circuit pre-allocation eliminates the time cost of circuit establishment by using request messages to ...
Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for application-specific SOCs
DAC '06: Proceedings of the 43rd annual Design Automation Conference

NOC architectures have to deliver good latency-throughput performance in the face of very tight power and area budgets. However, the latency and the power consumption for transferring information down the transmitter stack, through the channel, and up ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 18, Issue 4

Special Section on Networks on Chip: Architecture, Tools, and Methodologies

October 2013

380 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/2541012

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 25 October 2013

Accepted: 01 July 2013

Revised: 01 June 2013

Received: 01 January 2013

Published in TODAES Volume 18, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed

Funding Sources

Division of Computing and Communication Foundations

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
117
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wachter ECaimi LFochi VMunhoz DMoraes F(2017)BrNoC: A broadcast NoC for control messages in many-core systemsMicroelectronics Journal10.1016/j.mejo.2017.08.01068(69-77)Online publication date: Oct-2017
https://doi.org/10.1016/j.mejo.2017.08.010

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents