Skip to main content
Log in

Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Recently much effort has been spent on providing a shared address space abstraction on clusters of small-scale symmetric multiprocessors. However, advances in technology will soon make it possible to construct these clusters with larger-scale cc-NUMA nodes, connected with non-coherent networks that offer latencies and bandwidth comparable to interconnection networks used in hardware cache-coherent systems. The shared memory abstraction can be provided on these systems in software across nodes and hardware within nodes.

Recent simulation results have demonstrated that certain features of modern system area networks can be used to greatly reduce shared virtual memory (SVM) overheads [5,19]. In this work we leverage these results and we use detailed system emulation to investigate building future software shared memory clusters. We use an existing, large-scale hardware cache-coherent system with 64 processors to emulate a complete future cluster. We port our existing infrastructure (communication layer and shared memory protocol) on this system and study the behavior of a set of real applications. We present results for both 32- and 64-processor system configurations.

We find that: (i) System emulation is invaluable in quantifying potential benefits from changes in the technology of commodity components. More importantly, it reveals potential problems in future systems that are easily overlooked in simulation studies. Thus, system emulation should be used along with other modeling techniques (e.g., simulation, implementation) to investigate future trends. (ii) Our work shows that current SVM protocols can only partially take advantage of faster interconnects and wider nodes due to operating system and architectural implications. We quantify the related issues and identify the areas where more research is required for future SVM clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Agarwal, B.-H. Lim, D. Kranz and J. Kubiatowicz, April: A processor architecture for multiprocessing, in: Proc. of the 17th International Symposium on Computer Architecture (ISCA17) (May 1990) pp. 104-114.

  2. D.H. Bailey, FFTs in external or hierarchical memories, Journal of Supercomputing 4 (1990) 23-25.

    Google Scholar 

  3. R. Bianchini, L. Kontothanassis, R. Pinto, M.D. Maria, M. Abud and C. Amorim, Hiding communication latency and coherence overhead in software dsms, in: Proc. of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS7) (October 1996).

  4. A. Bilas, C. Liao and J.P. Singh, Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems, in: Proc. of the 26th International Symposium on Computer Architecture (ISCA26) (May 1999).

  5. A. Bilas and J.P. Singh, The effects of communication parameters on end performance of shared virtual memory clusters, in: Proc. of the 1997 Supercomputing Conference on High Performance Networking and Computing (SC96) (November 1997).

  6. G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S.J. Smith and M. Zagha, A comparison of sorting algorithms for the connection machine CM-2, in: Proc. of the 1st Annual ACM SIGPLAN Symposium on Parallel Algorithms and Architectures (SPAA91) (July 1991) pp. 3-16.

  7. N.J. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovic and W. Su, Myrinet: A gigabit-per-second local area network, IEEE Micro 15(1) (February 1995) 29-36.

    Google Scholar 

  8. C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis and K. Li, VMMC-2: Efficient support for reliable, connection-oriented communication, in: Proc. of the 1997 IEEE Symposium on High Performance Interconnects (HOT Interconnects V) (August 1997). A short version of this appears in IEEE Micro (January/Febuary 1998).

  9. A. Erlichson, N. Nuckolls, G. Chesson and J. Hennessy, SoftFLASH: Analyzing the performance of clustered distributed virtual shared memory, in: Proc. of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS7) (October 1996) pp. 210-220.

  10. Giganet, Giganet cLAN family of products (2001), http://www.emulex. com/products.html

  11. R. Grindley, T. Abdelrahman, S. Brown, S. Caranci, D. Devries, B. Gamsa, A. Grbic, M. Gusat, R. Ho, O. Krieger, G. Lemieux, K. Loveless, N. Manjikian, P. McHardy, S. Srblijic, M. Stumm, Z. Vranesic and Z. Zilac, The NUMAchine multiprocessor, in: Proc. of the 1900 International Conference on Parallel Processing (ICPP00), Toronto, Canada (August 2000).

  12. L. Iftode, C. Dubnicki, E.W. Felten and K. Li, Improving releaseconsistent shared virtual memory using automatic update, in: Proc. of the 2nd IEEE Symposium on High-Performance Computer Architecture (HPCA2) (February 1996).

  13. L. Iftode, J.P. Singh and K. Li, Understanding application performance on shared virtual memory, in: Proc. of the 23rd International Symposium on Computer Architecture (ISCA23) (May 1996).

  14. InfiniBand Trade Association, Infiniband architecture specification, version 1.0 (October 2000), http://www.infinibandta.org

  15. D. Jiang, B. Cokelley, X. Yu, A. Bilas and J.P. Singh, Application scaling under shared virtual memory on a cluster of smps, in: Proc. of the 13th ACM International Conference on Supercomputing (ICS99) (June 1999) pp. 165-174.

  16. D. Jiang, H. Shan and J.P. Singh, Application restructuring and performance portability across shared virtual memory and hardware-coherent multiprocessors, in: Proc. of the 1997 ACM Symposium on Principles and Practice of Parallel Programming (PPoPP97) (June 1997).

  17. D. Jiang and J.P. Singh, Does application performance scale on cache-coherent multiprocessors: A snapshot, in: Proc. of the 26th International Symposium on Computer Architecture (ISCA26) (May 1999).

  18. L.I. Kontothanassis, G. Hunt, R. Stets, N. Hardavellas, M. Cierniak, S. Parthasarathy, W. Meira, Jr., S. Dwarkadas and M.L. Scott, VMbased shared memory on low-latency, remote-memory-access networks, in: Proc. of the 24th Annual International Symposium on Computer Architecture (ISCA'97) (June 1997) pp. 157-169.

  19. L.I. Kontothanassis and M.L. Scott, Using memory-mapped network interfaces to improve the performance of distributed shared memory, in: Proc. of the 2nd IEEE Symposium on High-Performance Computer Architecture (HPCA2) (February 1996).

  20. J.P. Laudon and D. Lenoski, The SGI Origin2000: A scalable cc-numa server, in: Proc. of the 24th International Symposium on Computer Architecture (ISCA24) (June 1997).

  21. D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz and M. Lam, Design of the Stanford DASH multiprocessor, Technical Report CSL-TR-89-403, Stanford University (December 1989).

  22. J. Nieh and M. Levoy, Volume rendering on scalable shared-memory MIMD architectures, in: Proc. of the Boston Workshop on Volume Visualization (October 1992).

  23. S. Pakin, M. Buchanan, M. Lauria and A. Chien, The Fast Messages (FM) 2.0 streaming interface, in: Usenix'97 (1996).

  24. R. Samanta, A. Bilas, L. Iftode and J.P. Singh, Home-based svm protocols for smp clusters: Design, simulations, implementation and performance, in: Proc. of the 4th IEEE Symposium on High-Performance Computer Architecture (HPCA4) (February 1998).

  25. R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy and M. Scott, Cashmere-2L: Software coherent shared memory on a clustered remote-write network, in: Proc. of the 16th ACMSymposium on Operating Systems Principles (SOSP-16) (October 1997).

  26. S.C. Woo, M. Ohara, E. Torrie, J.P. Singh and A. Gupta, The SPLASH-2 programs: Characterization and methodological considerations, in: Proc. of the 22nd International Symposium on Computer Architecture (ISCA22), Santa Margherita Ligure, Italy (June 1995) pp. 24-36.

  27. D. Yeung, J. Kubiatowicz and A. Agarwal, Multigrain shared memory, ACM Transactions on Computer Systems 18(2) (May 2000) 154-196.

    Google Scholar 

  28. Y. Zhou, L. Iftode and K. Li, Performance evaluation of two homebased lazy release consistency protocols for shared virtual memory systems, in: Proc. of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI96) (October 1996).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angelos Bilas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bilas, A., Gibson, C.R., Azimi, R. et al. Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters. Cluster Computing 6, 325–338 (2003). https://doi.org/10.1023/A:1025713926030

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025713926030

Navigation