Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

Bilas, Angelos; Gibson, Courtney R.; Azimi, Reza; Christodoulopoulou, Rosalia; Jamieson, Peter

doi:10.1023/A:1025713926030

Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

Published: October 2003

Volume 6, pages 325–338, (2003)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Angelos Bilas¹,
Courtney R. Gibson¹,
Reza Azimi¹,
Rosalia Christodoulopoulou² &
…
Peter Jamieson¹

42 Accesses
Explore all metrics

Abstract

Recently much effort has been spent on providing a shared address space abstraction on clusters of small-scale symmetric multiprocessors. However, advances in technology will soon make it possible to construct these clusters with larger-scale cc-NUMA nodes, connected with non-coherent networks that offer latencies and bandwidth comparable to interconnection networks used in hardware cache-coherent systems. The shared memory abstraction can be provided on these systems in software across nodes and hardware within nodes.

Recent simulation results have demonstrated that certain features of modern system area networks can be used to greatly reduce shared virtual memory (SVM) overheads [5,19]. In this work we leverage these results and we use detailed system emulation to investigate building future software shared memory clusters. We use an existing, large-scale hardware cache-coherent system with 64 processors to emulate a complete future cluster. We port our existing infrastructure (communication layer and shared memory protocol) on this system and study the behavior of a set of real applications. We present results for both 32- and 64-processor system configurations.

We find that: (i) System emulation is invaluable in quantifying potential benefits from changes in the technology of commodity components. More importantly, it reveals potential problems in future systems that are easily overlooked in simulation studies. Thus, system emulation should be used along with other modeling techniques (e.g., simulation, implementation) to investigate future trends. (ii) Our work shows that current SVM protocols can only partially take advantage of faster interconnects and wider nodes due to operating system and architectural implications. We quantify the related issues and identify the areas where more research is required for future SVM clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model

COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability

References

A. Agarwal, B.-H. Lim, D. Kranz and J. Kubiatowicz, April: A processor architecture for multiprocessing, in: Proc. of the 17th International Symposium on Computer Architecture (ISCA17) (May 1990) pp. 104-114.
D.H. Bailey, FFTs in external or hierarchical memories, Journal of Supercomputing 4 (1990) 23-25.
Google Scholar
R. Bianchini, L. Kontothanassis, R. Pinto, M.D. Maria, M. Abud and C. Amorim, Hiding communication latency and coherence overhead in software dsms, in: Proc. of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS7) (October 1996).
A. Bilas, C. Liao and J.P. Singh, Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems, in: Proc. of the 26th International Symposium on Computer Architecture (ISCA26) (May 1999).
A. Bilas and J.P. Singh, The effects of communication parameters on end performance of shared virtual memory clusters, in: Proc. of the 1997 Supercomputing Conference on High Performance Networking and Computing (SC96) (November 1997).
G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S.J. Smith and M. Zagha, A comparison of sorting algorithms for the connection machine CM-2, in: Proc. of the 1st Annual ACM SIGPLAN Symposium on Parallel Algorithms and Architectures (SPAA91) (July 1991) pp. 3-16.
N.J. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovic and W. Su, Myrinet: A gigabit-per-second local area network, IEEE Micro 15(1) (February 1995) 29-36.
Google Scholar
C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis and K. Li, VMMC-2: Efficient support for reliable, connection-oriented communication, in: Proc. of the 1997 IEEE Symposium on High Performance Interconnects (HOT Interconnects V) (August 1997). A short version of this appears in IEEE Micro (January/Febuary 1998).
A. Erlichson, N. Nuckolls, G. Chesson and J. Hennessy, SoftFLASH: Analyzing the performance of clustered distributed virtual shared memory, in: Proc. of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS7) (October 1996) pp. 210-220.
Giganet, Giganet cLAN family of products (2001), http://www.emulex. com/products.html
R. Grindley, T. Abdelrahman, S. Brown, S. Caranci, D. Devries, B. Gamsa, A. Grbic, M. Gusat, R. Ho, O. Krieger, G. Lemieux, K. Loveless, N. Manjikian, P. McHardy, S. Srblijic, M. Stumm, Z. Vranesic and Z. Zilac, The NUMAchine multiprocessor, in: Proc. of the 1900 International Conference on Parallel Processing (ICPP00), Toronto, Canada (August 2000).
L. Iftode, C. Dubnicki, E.W. Felten and K. Li, Improving releaseconsistent shared virtual memory using automatic update, in: Proc. of the 2nd IEEE Symposium on High-Performance Computer Architecture (HPCA2) (February 1996).
L. Iftode, J.P. Singh and K. Li, Understanding application performance on shared virtual memory, in: Proc. of the 23rd International Symposium on Computer Architecture (ISCA23) (May 1996).
InfiniBand Trade Association, Infiniband architecture specification, version 1.0 (October 2000), http://www.infinibandta.org
D. Jiang, B. Cokelley, X. Yu, A. Bilas and J.P. Singh, Application scaling under shared virtual memory on a cluster of smps, in: Proc. of the 13th ACM International Conference on Supercomputing (ICS99) (June 1999) pp. 165-174.
D. Jiang, H. Shan and J.P. Singh, Application restructuring and performance portability across shared virtual memory and hardware-coherent multiprocessors, in: Proc. of the 1997 ACM Symposium on Principles and Practice of Parallel Programming (PPoPP97) (June 1997).
D. Jiang and J.P. Singh, Does application performance scale on cache-coherent multiprocessors: A snapshot, in: Proc. of the 26th International Symposium on Computer Architecture (ISCA26) (May 1999).
L.I. Kontothanassis, G. Hunt, R. Stets, N. Hardavellas, M. Cierniak, S. Parthasarathy, W. Meira, Jr., S. Dwarkadas and M.L. Scott, VMbased shared memory on low-latency, remote-memory-access networks, in: Proc. of the 24th Annual International Symposium on Computer Architecture (ISCA'97) (June 1997) pp. 157-169.
L.I. Kontothanassis and M.L. Scott, Using memory-mapped network interfaces to improve the performance of distributed shared memory, in: Proc. of the 2nd IEEE Symposium on High-Performance Computer Architecture (HPCA2) (February 1996).
J.P. Laudon and D. Lenoski, The SGI Origin2000: A scalable cc-numa server, in: Proc. of the 24th International Symposium on Computer Architecture (ISCA24) (June 1997).
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz and M. Lam, Design of the Stanford DASH multiprocessor, Technical Report CSL-TR-89-403, Stanford University (December 1989).
J. Nieh and M. Levoy, Volume rendering on scalable shared-memory MIMD architectures, in: Proc. of the Boston Workshop on Volume Visualization (October 1992).
S. Pakin, M. Buchanan, M. Lauria and A. Chien, The Fast Messages (FM) 2.0 streaming interface, in: Usenix'97 (1996).
R. Samanta, A. Bilas, L. Iftode and J.P. Singh, Home-based svm protocols for smp clusters: Design, simulations, implementation and performance, in: Proc. of the 4th IEEE Symposium on High-Performance Computer Architecture (HPCA4) (February 1998).
R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy and M. Scott, Cashmere-2L: Software coherent shared memory on a clustered remote-write network, in: Proc. of the 16th ACMSymposium on Operating Systems Principles (SOSP-16) (October 1997).
S.C. Woo, M. Ohara, E. Torrie, J.P. Singh and A. Gupta, The SPLASH-2 programs: Characterization and methodological considerations, in: Proc. of the 22nd International Symposium on Computer Architecture (ISCA22), Santa Margherita Ligure, Italy (June 1995) pp. 24-36.
D. Yeung, J. Kubiatowicz and A. Agarwal, Multigrain shared memory, ACM Transactions on Computer Systems 18(2) (May 2000) 154-196.
Google Scholar
Y. Zhou, L. Iftode and K. Li, Performance evaluation of two homebased lazy release consistency protocols for shared virtual memory systems, in: Proc. of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI96) (October 1996).

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada, M5S 3G4
Angelos Bilas, Courtney R. Gibson, Reza Azimi & Peter Jamieson
Department of Computer Science, University of Toronto, Toronto, ON, Canada, M5S 3G4
Rosalia Christodoulopoulou

Authors

Angelos Bilas
View author publications
You can also search for this author in PubMed Google Scholar
Courtney R. Gibson
View author publications
You can also search for this author in PubMed Google Scholar
Reza Azimi
View author publications
You can also search for this author in PubMed Google Scholar
Rosalia Christodoulopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Peter Jamieson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Angelos Bilas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bilas, A., Gibson, C.R., Azimi, R. et al. Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters. Cluster Computing 6, 325–338 (2003). https://doi.org/10.1023/A:1025713926030

Download citation

Issue Date: October 2003
DOI: https://doi.org/10.1023/A:1025713926030

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

Abstract

Access this article

Similar content being viewed by others

Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model

COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

Abstract

Access this article

Similar content being viewed by others

Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model

COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores

Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation