ABSTRACT
In this paper, we analyze and compare different placements of memory controllers for Chip Multiprocessors (CMPs). As the number of cores increases, Network-on-Chip (NoC) based architectures are proposed as a promising interconnect technique for CMP. The memory bandwidth between on-chip components and off-chip memory has become a critical problem. The integration of more memory controllers on chip is one feasible way to solve this problem. However, the physical location of memory controllers in a mesh-based NoC have a significant impact on system performance. We investigate the placement of multiple memory controllers in an 8x8 NoC. Several metrics have been analyzed. An optimal memory controller placement is found and evaluated. We propose a generic "divide and conquer" method for solving the placement of memory controllers in large NoCs. By using applications selected from SPLASH-2, PARSEC, TPC and SPEC as benchmarks, it is shown that the average network latency, average link utilization and performance power product in our optimal placement are reduced by 7.63%, 10.44% and 13.94% compared with the conventional two-sides placement, respectively. This paper gives a solid theoretical foundation to future CMP design.
- D. Abts, N. D. E. Jerger, J. Kim, D. Gibson, and M. H. Lipasti. Achieving predictable performance through better memory controller placement in many-core cmps. In Proc. of the 36th ISCA, pages 451--461, June 2009. Google ScholarDigital Library
- AMD. The amd opteron 6000 series platform, May 2010. http://www.amd.com/-us/products/server/processors/6000-series-platform/pages/6000-series-platform.aspx.Google Scholar
- S. I. Association. The international technology roadmap for semiconductors (itrs), 2007. http://www.itrs.net/Links/2007ITRS/Home2007.htm.Google Scholar
- M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the problems and opportunities posed by multiple on-chip memory controllers. In Proceedings of the 19th PACT, pages 319--330, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: characterization and architectural implications. In Proceedings of the 17th PACT, pages 72--81, October 2008. Google ScholarDigital Library
- T. Corporation, August 2010. http://www.tilera.com.Google Scholar
- W. J. Dally and B. Towles. Route packets, not wires: on-chip inteconnection networks. In Proceedings of the 38th DAC, pages 684--689, June 2001. Google ScholarDigital Library
- H. Global. Ddr 2 memory controller ip core for fpga and asic, June 2010. http://www.hitechglobal.com/ipcores/ddr2controller.htm.Google Scholar
- IBM. Ibm power 7 processor. In Hot chips 2009, August 2009.Google Scholar
- Intel. Intel core i7 processor extreme edition and intel core i7 processor datasheet, volume 1, December 2008. http://download.intel.com/design/processor/datashts-/320834.pdf.Google Scholar
- Intel. Single-chip cloud computer, May 2010. http://techresearch.intel.com/-articles/Tera-Scale/1826.htm.Google Scholar
- A. Kahng, B. Li, L.-S. Peh, and K. Samadi. Orion 2.0: A fast and accurate noc power and area model for early-stage design space exploration. In DATE 2009, pages 423 --428, 2009. Google ScholarDigital Library
- C. Kim, D. Burger, and S. W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In ACM SIGPLAN, pages 211--222, October 2002. Google ScholarDigital Library
- Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the 16th HPCA, pages 1 --12, 2010.Google Scholar
- J. W. Lee, M. C. Ng, and K. Asanovic. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In Proceedings of the 35th ISCA, pages 89--100, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarDigital Library
- P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, February 2002. Google ScholarDigital Library
- O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. SIGARCH Comput. Archit. News, 36(3):63--74, 2008. Google ScholarDigital Library
- K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair queuing memory systems. In MICRO 39, pages 208--222, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- A. Patel and K. Ghose. Energy-efficient mesi cache coherence with pro-active snoop filtering for multicore microprocessors. In Proceeding of the 13th ISLPED, pages 247--252, August 2008. Google ScholarDigital Library
- T. Shyamkumar, M. Naveen, A. J. Ho, and J. N. P. Cacti 5.1. Technical Report HPL-2008--20, HP Labs.Google Scholar
- H. Sullivan and T. R. Bashkow. A large scale, homogeneous, fully distributed parallel machine. In Proceedings of the 4th ISCA, pages 105--117, March 1977. Google ScholarDigital Library
- TPC. Tpc-h decision support benchmark. http://www.tpc.org/tpch/.Google Scholar
- M. Tremblay and S. Chaudhry. A third-generation 65nm 16-core 32-thread plus 32-scout-thread cmt sparc processor. In ISSCC 2008, pages 82--83, February 2008.Google ScholarCross Ref
- S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar. An 80-tile 1.28tflops network-on-chip in 65nm cmos. In ISSCC 2007. Digest of Technical Papers. IEEE International, pages 98--589, Feb. 2007.Google ScholarCross Ref
- D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. Brown, and A. Agarwal. On-chip interconnection architecture of the tile processor. Micro, IEEE, 27(5):15 --31, sept.-oct. 2007. Google ScholarDigital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd ISCA, pages 24--36, June 1995. Google ScholarDigital Library
Index Terms
- Optimal memory controller placement for chip multiprocessor
Recommendations
Achieving predictable performance through better memory controller placement in many-core CMPs
In the near term, Moore's law will continue to provide an increasing number of transistors and therefore an increasing number of on-chip cores. Limited pin bandwidth prevents the integration of a large number of memory controllers on-chip. With many ...
Scalable Hybrid Wireless Network-on-Chip Architectures for Multicore Systems
Multicore platforms are emerging trends in the design of System-on-Chips (SoCs). Interconnect fabrics for these multicore SoCs play a crucial role in achieving the target performance. The Network-on-Chip (NoC) paradigm has been proposed as a promising ...
Optimal placement of vertical connections in 3D Network-on-Chip
Due to technological limitations, manufacturing yield of vertical connections (Through Silicon Vias, TSVs) in 3D Networks-on-Chip (NoC) decreases rapidly when the number of TSVs grows. The adoption of 3D NoC design depends on the performance and ...
Comments