Abstract
As the core count grows rapidly, NoC (Network-on-Chip) consumes an increasing fraction of the modern processors/SoCs (System-on-Chips) power. It is thus very important to design energy-efficient NoC architecture. Multi-NoC (Multiple Network-on-Chip) has demonstrated its advantages in power gating for reducing leakage power, which constitutes a significant fraction of NoC power. In this paper, we propose Chameleon, a novel heterogeneous Multi-NoC design. Chameleon employs a fine-grained power gating algorithm which exploits power saving opportunities at different levels of granularity simultaneously. Integrated with a congestion-aware traffic allocation policy, Chameleon is able to achieve both high performance and low power at varying network utilization. Our experimental results on both synthetic and real workloads show that Chameleon delivers an average of 2.61 % higher performance than Catnap, the best in the literature. More importantly, Chameleon consumes an average of 27.75 % less power than Catnap.









Similar content being viewed by others
References
Salihundam P, Jain S, Jacob T, Kumar S, Erraguntla V, Hoskote Y, Vangal SR, Ruhl G, Borkar N (2011) A 2 tb/s 6 x 4 mesh network for a single-chip cloud computer with dvfs in 45 nm cmos. J Solid-State Circ 46(4):757–766
Kim JS, Taylor MB, Miller J, Wentzlaff D (2003) Energy characterization of a tiled architecture processor with on-chip networks. In: Proceedings of the 2003 International Symposium on Low Power Electronics and Design, ser. ISLPED ’03. New York, ACM, pp 424–427
Das R, Narayanasamy S, Satpathy SK, Dreslinski RG (2013) Catnap: energy proportional multiple network-on-chip. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ser. ISCA ’13. New York, NY, USA, ACM, pp 320–331
Parikh R, Das R, Bertacco V (2014) Power-aware nocs through routing and topology reconfiguration. In: Proceedings of the 51st Annual Design Automation Conference, ser. DAC ’14, New York, NY, USA, ACM, pp 162:1–162:6
Sun C, Chen C-HO, Kurian G, Wei L, Miller J, Agarwal A, Peh LS, Stojanovic V (2012) Dsent - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In: Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip, ser. NOCS ’12, Washington, IEEE Computer Society, pp 201–210
Moscibroda T, Mutlu O (2009) A case for bufferless routing in on-chip networks. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ser. ISCA ’09. New York, ACM, pp 196–207
Samih A, Wang R, Krishna A, Maciocco C, Tai C, Solihin Y (2013) Energy-efficient interconnect via router parking. In: Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), ser. HPCA ’13, Washington, IEEE Computer Society, pp 508–519
Taylor MB, Kim J, Miller J, Wentzlaff D, Ghodrat F, Greenwald B, Hoffman H, Johnson P, Lee J-W, Lee W, Ma A, Saraf A, Seneski M, Shnidman N, Strumpen V, Frank M, Amarasinghe S, Agarwal A (2002) The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22(2):25–35
Sankaralingam K, Nagarajan R, Liu H, Kim C, Huh J, Burger D, Keckler SW, Moore CR (2003) Exploiting ilp, tlp, and dlp with the polymorphous trips architecture. In: Proceedings of the 30th Annual International Symposium on Computer Architecture, ser. ISCA ’03, New York, ACM, pp 422–433
Wentzlaff D, Griffin P, Hoffmann H, Bao L, Edwards B, Ramey C, Mattina M, Miao C-C, Brown JF III, Agarwal A (2007) On-chip interconnection architecture of the tile processor. IEEE Micro 27(5):15–31
Matsutani H, Koibuchi M, Wang D, Amano H (2008) Run-time power gating of on-chip routers using look-ahead routing. In: Proceedings of the 2008 Asia and South Pacific Design Automation Conference, ser. ASP-DAC ’08. Los Alamitos, IEEE Computer Society Press, pp 55–60
Matsutani H, Koibuchi M, Ikebuchi D, Usami K, Nakamura H, Amano H (2011) Performance, area, and power evaluations of ultrafine-grained run-time power-gating routers for cmps. Trans Comp-Aided Des Integ Cir Sys 30(4):520–533
Kim G, Kim J, Yoo S (2011) Flexibuffer: reducing leakage power in on-chip network routers. In: Proceedings of the 48th Design Automation Conference, ser. DAC ’11, New York, ACM, pp 936–941
Chen L, Pinkston TM (2012) Nord: node-router decoupling for effective power-gating of on-chip routers. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-45. Washington, IEEE Computer Society, pp 270–281
Chen L, Zhao L, Wang R, Pinkston TM (2014) “MP3: minimizing performance penalty for power-gating of clos network-on-chip,” in 20th IEEE International Symposium on High Performance Computer Architecture, HPCA (2014) Orlando, FL, USA, February 15–19, IEEE Computer Society 2014:296–307
Chen L, Zhu D, Pedram M, Pinkston TM (2015) Power punch: towards non-blocking power-gating of noc routers. In: Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), ser. HPCA ’15, Washington, IEEE Computer Society
Dally WJ, Towles B (2004) Principles and practices of interconnection networks. Morgan Kaufmann
Jiang N, Becker DU, Michelogiannakis G, Balfour J, Towles B, Kim J, Dally WJ (2013) A detailed and flexible cycle-accurate network-on-chip simulator. In: Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software
Badr M, Jerger NE (2014) Synfull: Synthetic traffic models capturing cache coherent behaviour. In: Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA ’14, Piscataway, NJ, USA: IEEE Press, pp 109–120. [Online]. Available: http://dl.acm.org/citation.cfm?id=2665671.2665691
Bienia C (2011) Benchmarking modern multiprocessors. Ph.D. dissertation, Princeton, NJ, USA, aAI3445564
Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The splash-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, ser. ISCA ’95, New York, NY, USA, ACM, pp 24–36. [Online]. doi:10.1145/223982.223990
Michelogiannakis G, Shalf J (2014) Variable-width datapath for on-chip network static power reduction. In: Proceedings of the 2014 IEEE/ACM Sixth International Symposium on Networks-on-Chip, ser. NOCS ’14, Washington, DC, USA, IEEE Computer Society, pp 96–103
Chaitin GJ (1982) Register allocation & spilling via graph coloring. In: SIGPLAN ’82: Proceedings of the 1982 SIGPLAN symposium on Compiler construction. ACM Press, Boston, MA, USA, pp 98–101
Briggs P, Cooper KD, Torczon L (1994) Improvements to graph coloring register allocation. ACM Trans Program Lang Syst 16(3):428–455
Wang L, Yang X, Dai H (2013) Scratchpad memory allocation for arrays in permutation graphs. Sci Chin Inf Sci 56(5):1–13
Wang L, Xue J, Yang X (2014) Acyclic orientation graph coloring for software-managed memory allocation. Sci Chin Inf Sci 57(9):1–18
Richter RJ (1990) A reconfigurable interconnection network for flexible pipelining. In: CONPAR 90-VAPP IV, Joint International Conference on Vector and Parallel Processing, Zurich, Switerland, September 10–13, Proceedings, 1990, pp 397–404
Bhandarkar SM, Arabnia HR (1995) The refine multiprocessor—theoretical properties and algorithms. Parallel Comput 21(11):1783–1805
Wani MA, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multiring network. J Supercomputing 25(1):43–62
Bhandarkar SM, Arabnia HR, Smith JW (1995) A reconfigurable architecture for image processing and computer vision. Int J Pattern Recog Artificial Intell 9(2):201–229
Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomputing 10(3):243–269
Mishra AK, Mutlu O, Das CR (2013) A heterogeneous multiple network-on-chip design: an application-aware approach. In: The 50th Annual Design Automation Conference 2013, DAC ’13, Austin, TX, USA, May 29-June 07, 2013
Balfour J, Dally WJ (2006) Design tradeoffs for tiled cmp on-chip networks. In: Proceedings of the 20th Annual International Conference on Supercomputing, ser. ICS ’06, New York, NY, USA, ACM, pp 187–198
Fallin C, Nazario G, Yu X, Chang K, Ausavarungnirun R, Mutlu O (2012) Minbd: minimally-buffered deflection routing for energy-efficient interconnect. In: Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip, ser. NOCS ’12, Washington, DC, USA, IEEE Computer Society, pp 1–10
Bokhari H, Javaid H, Shafique M, Henkel J, Parameswaran S (2014) Darknoc: designing energy-efficient network-on-chip with multi-vt cells for dark silicon. In: Proceedings of the 51st Annual Design Automation Conference, ser. DAC ’14, New York, NY, USA, ACM, pp 161:1–161:6
Wu J, Dong D, Liao X, Wang L (2015) Chameleon: adaptive energy-efficient heterogeneous network-on-chip. In: 33rd IEEE International Conference on Computer Design, ICCD 2015, New York City, NY, USA, pp 419–422
Acknowledgments
This research is supported by the National Natural Science Foundation of China (No. 61370018, 61272482) and FANEDD under Grant No. 201450.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, J., Dong, D., Liao, X. et al. Energy-efficient NoC with multi-granularity power optimization. J Supercomput 73, 1654–1671 (2017). https://doi.org/10.1007/s11227-016-1859-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1859-8