skip to main content
10.1145/1851476.1851495acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

A practical way to extend shared memory support beyond a motherboard at low cost

Authors Info & Claims
Published:21 June 2010Publication History

ABSTRACT

Improvements in parallel computing hardware usually involve increments in the number of available resources for a given application such as the number of computing cores and the amount of memory. In the case of shared-memory computers, the increase in computing resources and available memory is usually constrained by the coherency protocol, whose overhead rises with system size, limiting the scalability of the final system. In this paper we propose an efficient and cost-effective way to increase the memory available for a given application by leveraging free memory in other computers in the cluster.

Our proposal is based on the observation that many applications benefit from having more memory resources but do not require more computing cores, thus reducing the requirements for cache coherency and allowing a simpler implementation and better scalability.

Simulation results show that, when additional mechanisms intended to hide remote memory latency are used, execution time of applications that use our proposal is similar to the time required to execute them in a computer populated with enough local memory, thus validating the feasibility of our proposal. We are currently building a prototype that implements our ideas.

References

  1. }}3leaf Systems. http://www.3leafsystems.com.Google ScholarGoogle Scholar
  2. }}Dynamic Logical Partitioning. White Paper. http://www.ibm.com/systems/p/hardware/whitepapers/dlpar.html.Google ScholarGoogle Scholar
  3. }}Gaussian 03. http://www.gaussian.com.Google ScholarGoogle Scholar
  4. }}IBM z Series. http://www.ibm.com/systems/z.Google ScholarGoogle Scholar
  5. }}In-Memory Database Systems (IMDSs) Beyond the Terabyte Size Boudary. http://www.mcobject.com/130/EmbeddedDatabaseWhitePapers.htm.Google ScholarGoogle Scholar
  6. }}MBA3 NC Series Catalog. http://www.fujitsu.com/global/services/computing/storage/hdd/ehdd/mba3073nc-mba3300nc.html.Google ScholarGoogle Scholar
  7. }}NUMAChip. http://www.numachip.com/.Google ScholarGoogle Scholar
  8. }}ScaleMP. http://www.scalemp.com.Google ScholarGoogle Scholar
  9. }}Violin Memory. http://violin-memory.com.Google ScholarGoogle Scholar
  10. }}HyperTransport Technology Consortium. HyperTransport I/O Link Specification Revision 3.10, 2008. available at http://www.hypertransport.org.Google ScholarGoogle Scholar
  11. }}A. Acharya and S. Setia. Availability and Utility of Idle Memory in Workstation Clusters. SIGMETRICS Perform. Eval. Rev., 27(1):35--46, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}T. Anderson, D. Culler, and D. Patterson. A case for NOW (Networks of Workstations). Micro, IEEE, 15(1):54--64, Feb 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}C. Bienia, S. Kumar, et al. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th PACT, October 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}M. Chapman and G. Heiser. vNUMA: A virtual shared-memory multiprocessor. In Proceedings of the 2009 USENIX Annual Technical Conference, pages 349--362, San Diego, CA, USA, Jun 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}P. Charles, C. Grothoff, V. Saraswat, et al. X10: an Object-Oriented Approach to Non-Uniform Cluster Computing. SIGPLAN Not., 40(10):519--538, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}P. Conway and B. Hughes. The AMD Opteron Northbridge Architecture. IEEE Micro, 27(2):10--21, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}P. Conway, N. Kalyanasundharam, G. Donley, et al. Blade Computing with the AMD Opteron Processor (Magny-Cours). Hot chips 21, Aug 2009.Google ScholarGoogle Scholar
  18. }}J. Duato, F. Silla, S. Yalamanchili, et al. Extending HyperTransport Protocol for Improved Scalability. First International Workshop on HyperTransport Research and Applications, 2009.Google ScholarGoogle Scholar
  19. }}M. J. Feeley, W. E. Morgan, E. P. Pighin, A. R. Karlin, H. M. Levy, and C. A. Thekkath. Implementing global memory management in a workstation cluster. In SOSP '95: Proceedings of the fifteenth ACM symposium on Operating systems principles, pages 201--212, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}H. Garcia-Molina and K. Salem. Main Memory Database Systems: an Overview. Knowledge and Data Engineering, IEEE Transactions on, 4(6):509--516, Dec 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}J. Gray, D. T. Liu, M. Nieto-Santisteban, et al. Scientific Data Management in the Coming Decade. SIGMOD Rec., 34(4):34--41, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}IBM journal of Research and Development staff. Overview of the IBM Blue Gene/P project. IBM J. Res. Dev., 52(1/2):199--220, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}C. Keltcher, K. McGrath, A. Ahmed, and P. Conway. The AMD Opteron Processor for Multiprocessor Servers. Micro, IEEE, 23(2):66--76, March-April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}S. Kottapalli and J. Baxter. Nehalem-EX CPU Architecture. Hot chips 21, Aug 2009.Google ScholarGoogle Scholar
  25. }}S. Liang, R. Noronha, and D. Panda. Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device. In Cluster Computing, 2005. IEEE International, pages 1--10, Sept. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  26. }}H. Litz, H. Fröening, M. Nuessle, and U. Brüening. A HyperTransport Network Interface Controller for Ultra-low Latency Message Transfers. HyperTransport Consortium White Paper, 2007.Google ScholarGoogle Scholar
  27. }}H. Litz, H. Fröening, M. Nuessle, and U. Brüening. VELO: A Novel Communication Engine for Ultra-Low Latency Message Transfers. In Parallel Processing, 2008. ICPP '08. 37th International Conference on, pages 238--245, Sept. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. }}P. Magnusson, M. Christensson, J. Eskilson, et al. Simics: A Full System Simulation Platform. Computer, 35(2):50--58, Feb 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. }}M. Martin, D. Sorin, B. Beckmann, et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. }}J. D. McCalpin. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.Google ScholarGoogle Scholar
  31. }}M. Oguchi and M. Kitsuregawa. Using Available Remote Memory Dynamically for Parallel Data Mining Application on ATM-connected PC Cluster. In IPDPS 2000. Proceedings. 14th International, pages 411--420, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. }}J. Oleszkiewicz, L. Xiao, and Y. Liu. Parallel Network RAM: Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs. In Parallel Processing, 2004. ICPP 2004. International Conference on, pages 353--360 vol. 1, Aug. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. }}M. Ronstrom and L. Thalmann. MySQL Cluster Architecture Overview. Technical White Paper. MySQL, 2004.Google ScholarGoogle Scholar
  34. }}D. Slogsnat, A. Giese, M. Nüssle, and U. Brüning. An Open-source HyperTransport Core. ACM Trans. Reconfigurable Technol. Syst., 1(3):1--21, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. }}A. S. Szalay, J. Gray, and J. vandenBerg. Petabyte Scale Data Mining: Dream or Reality? CoRR, cs.DB/0208013, 2002.Google ScholarGoogle Scholar
  36. }}J. Tuck, L. Ceze, and J. Torrellas. Scalable Cache Miss Handling for High Memory-Level Parallelism. Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. }}K. Yelick. Computer architecture: Opportunities and challenges for scalable applications. Sandia CSRI Workshop on Next-generation scalable applications: When MPI-only is not enough, Jun 2008.Google ScholarGoogle Scholar
  38. }}K. Yelick. Programming models: Opportunities and challenges for scalable applications. Sandia CSRI Workshop on Next-generation scalable applications: When MPI-only is not enough, Jun 2008.Google ScholarGoogle Scholar

Index Terms

  1. A practical way to extend shared memory support beyond a motherboard at low cost

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
      June 2010
      911 pages
      ISBN:9781605589428
      DOI:10.1145/1851476

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 June 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate166of966submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader