Abstract
Recently much effort has been spent on providing a shared address space abstraction on clusters of small-scale symmetric multiprocessors. However, advances in technology will soon make it possible to construct these clusters with larger-scale cc-NUMA nodes, connected with non-coherent networks that ofier latencies and bandwidth comparable to interconnection networks used in hardware cache-coherent systems. The shared memory abstraction can be provided on these systems in software across nodes and in hardware within nodes. In this work we investigate this approach to building future software shared memory clusters. We use an existing, large-scale hardware cache- coherent system with 64 processors to emulate a future cluster. We present results for both 32- and 64-processor system configurations. We quantify the effects of faster interconnects and wide, NUMA nodes on system design and identify the areas where more research is required for future SVM clusters. We find that current SVM protocols can only partially take advantage of faster interconnects and they need to be adjusted to the new system features. In particular, unlike in today’s clusters that employ SMP nodes, improving intra-node synchronization and data placement are key issues for future clusters. Data wait time and synchronization costs are not major issues, when not affected by the cost of page invalidations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Bilas, C. Liao, and J. P. Singh. Accelerating shared virtual memory using commodity ni support to avoid asynchronous message handling. In The 26th International Symposium on Computer Architecture, May 1999.
A. Bilas and J. P. Singh. The effects of communication parameters on end performance of shared virtual memory clusters. In Proceedings of Supercomputing 97, clSan Jose, CA, November 1997.
C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis, and K. Li. VMMC-2: efficient support for reliable, connection-oriented communication. In Proceedings of Hot Interconnects, Aug. 1997.
A. Erlichson, N. Nuckolls, G. Chesson, and J. Hennessy. SoftFLASH: analyzing the performance of clustered distributed virtual shared memory. In The 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 210–220, Oct 1996.
C. Gibson and A. Bilas. Shared virtual memory clusters with next-generation interconnection networks and wide compute nodes. Technical ReportTR-01-01-02, Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S3G4, Canada, 2001.
R. Grindley, T. Abdelrahman, S. Brown, S. Caranci, D. Devries, B. Gamsa, A. Grbic, M. Gusat, R. Ho, O. Krieger, G. Lemieux, K. Loveless, N. Manjikian, P. McHardy, S. Srblijic, M. Stumm, Z. Vranesic, and Z. Zilac. The NUMAchine Multiprocessor. In The 2000 International Conference on Parallel Processing (ICPP2000), Toronto, Canada, Aug. 2000.
L. Iftode, J. P. Singh, and K. Li. Understanding application performance on shared virtual memory. In Proceedings of the 23rd International Symposium on Computer Architecture (ISCA), May 1996.
D. Jiang, B. Cokelley, X. Yu, A. Bilas, and J. P. Singh. Applicaiton scaling under shared virtual memory on a cluster of smps. In The 13th ACM International Conference on Supercomputing (ICS’99), June 1999.
D. Jiang, H. Shan, and J. P. Singh. Application restructuring and performance portability across shared virtual memory and hardware-coherent multiprocessors. In Proceedings of the 6th ACM Symposium on Principles and Practice of Parallel Programming, June 1997.
D. Jiang and J. P. Singh. Does application performance scale on cache-coherent multiprocessors: A snapshot. In Proceedings of the 26th International Symposium on Computer Architecture (ISCA), May 1999.
L. I. Kontothanassis and M. L. Scott. Using memory-mapped network interfaces to improve t he performance of distributed shared memory. In The 2nd IEEE Symposium on High-Performance Computer Architecture, Feb. 1996.
J. P. Laudon and D. Lenoski. The SGI Origin2000: a scalable cc-numa server. In Proceedings of the 24rd Annual International Symposium on Computer Architecture, June 1997.
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. Design of the Stanford DASH multiprocessor. Technical Report CSL-TR-89-403, Stanford University, December1989.
R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, and M. Scott. Cashmere-2L: Software Coherent Shared Memory on a Clustered Remote-Write Network. In Proc. of the 16th ACM Symp. on Operating Systems Principles (SOSP-16), Oct. 1997.
S. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. Methodological considerations and characterization of the SPLASH-2 parallel application suite. In Proceedings of the 23rd International Symposium on Computer Architecture (ISCA), May 1995.
D. Yeung, J. Kubiatowicz, and A. Agarwal. Multigrain shared memory. ACM Transactions on Computer Systems, 18(2):154–196, May 2000.
Y. Zhou, L. Iftode, and K. Li. Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems. In Proceedings of the Operating Systems Design and Implementation Symposium, Oct. 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gibson, C.R., Bilas, A. (2001). Shared Virtual Memory Clusters with Next-Generation Interconnection Networks and Wide Compute Nodes. In: Monien, B., Prasanna, V.K., Vajapeyam, S. (eds) High Performance Computing — HiPC 2001. HiPC 2001. Lecture Notes in Computer Science, vol 2228. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45307-5_15
Download citation
DOI: https://doi.org/10.1007/3-540-45307-5_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43009-4
Online ISBN: 978-3-540-45307-9
eBook Packages: Springer Book Archive