skip to main content
10.1145/2039370.2039405acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Optimal memory controller placement for chip multiprocessor

Published:09 October 2011Publication History

ABSTRACT

In this paper, we analyze and compare different placements of memory controllers for Chip Multiprocessors (CMPs). As the number of cores increases, Network-on-Chip (NoC) based architectures are proposed as a promising interconnect technique for CMP. The memory bandwidth between on-chip components and off-chip memory has become a critical problem. The integration of more memory controllers on chip is one feasible way to solve this problem. However, the physical location of memory controllers in a mesh-based NoC have a significant impact on system performance. We investigate the placement of multiple memory controllers in an 8x8 NoC. Several metrics have been analyzed. An optimal memory controller placement is found and evaluated. We propose a generic "divide and conquer" method for solving the placement of memory controllers in large NoCs. By using applications selected from SPLASH-2, PARSEC, TPC and SPEC as benchmarks, it is shown that the average network latency, average link utilization and performance power product in our optimal placement are reduced by 7.63%, 10.44% and 13.94% compared with the conventional two-sides placement, respectively. This paper gives a solid theoretical foundation to future CMP design.

References

  1. D. Abts, N. D. E. Jerger, J. Kim, D. Gibson, and M. H. Lipasti. Achieving predictable performance through better memory controller placement in many-core cmps. In Proc. of the 36th ISCA, pages 451--461, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD. The amd opteron 6000 series platform, May 2010. http://www.amd.com/-us/products/server/processors/6000-series-platform/pages/6000-series-platform.aspx.Google ScholarGoogle Scholar
  3. S. I. Association. The international technology roadmap for semiconductors (itrs), 2007. http://www.itrs.net/Links/2007ITRS/Home2007.htm.Google ScholarGoogle Scholar
  4. M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the problems and opportunities posed by multiple on-chip memory controllers. In Proceedings of the 19th PACT, pages 319--330, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: characterization and architectural implications. In Proceedings of the 17th PACT, pages 72--81, October 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Corporation, August 2010. http://www.tilera.com.Google ScholarGoogle Scholar
  7. W. J. Dally and B. Towles. Route packets, not wires: on-chip inteconnection networks. In Proceedings of the 38th DAC, pages 684--689, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Global. Ddr 2 memory controller ip core for fpga and asic, June 2010. http://www.hitechglobal.com/ipcores/ddr2controller.htm.Google ScholarGoogle Scholar
  9. IBM. Ibm power 7 processor. In Hot chips 2009, August 2009.Google ScholarGoogle Scholar
  10. Intel. Intel core i7 processor extreme edition and intel core i7 processor datasheet, volume 1, December 2008. http://download.intel.com/design/processor/datashts-/320834.pdf.Google ScholarGoogle Scholar
  11. Intel. Single-chip cloud computer, May 2010. http://techresearch.intel.com/-articles/Tera-Scale/1826.htm.Google ScholarGoogle Scholar
  12. A. Kahng, B. Li, L.-S. Peh, and K. Samadi. Orion 2.0: A fast and accurate noc power and area model for early-stage design space exploration. In DATE 2009, pages 423 --428, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Kim, D. Burger, and S. W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In ACM SIGPLAN, pages 211--222, October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the 16th HPCA, pages 1 --12, 2010.Google ScholarGoogle Scholar
  15. J. W. Lee, M. C. Ng, and K. Asanovic. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In Proceedings of the 35th ISCA, pages 89--100, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, February 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. SIGARCH Comput. Archit. News, 36(3):63--74, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair queuing memory systems. In MICRO 39, pages 208--222, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Patel and K. Ghose. Energy-efficient mesi cache coherence with pro-active snoop filtering for multicore microprocessors. In Proceeding of the 13th ISLPED, pages 247--252, August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Shyamkumar, M. Naveen, A. J. Ho, and J. N. P. Cacti 5.1. Technical Report HPL-2008--20, HP Labs.Google ScholarGoogle Scholar
  21. H. Sullivan and T. R. Bashkow. A large scale, homogeneous, fully distributed parallel machine. In Proceedings of the 4th ISCA, pages 105--117, March 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. TPC. Tpc-h decision support benchmark. http://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  23. M. Tremblay and S. Chaudhry. A third-generation 65nm 16-core 32-thread plus 32-scout-thread cmt sparc processor. In ISSCC 2008, pages 82--83, February 2008.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar. An 80-tile 1.28tflops network-on-chip in 65nm cmos. In ISSCC 2007. Digest of Technical Papers. IEEE International, pages 98--589, Feb. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  25. D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. Brown, and A. Agarwal. On-chip interconnection architecture of the tile processor. Micro, IEEE, 27(5):15 --31, sept.-oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd ISCA, pages 24--36, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimal memory controller placement for chip multiprocessor

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
        October 2011
        402 pages
        ISBN:9781450307154
        DOI:10.1145/2039370

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 October 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate280of864submissions,32%

        Upcoming Conference

        ESWEEK '24
        Twentieth Embedded Systems Week
        September 29 - October 4, 2024
        Raleigh , NC , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader