skip to main content
10.1145/237090.237187acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free Access

SoftFLASH: analyzing the performance of clustered distributed virtual shared memory

Authors Info & Claims
Published:01 September 1996Publication History

ABSTRACT

One potentially attractive way to build large-scale shared-memory machines is to use small-scale to medium-scale shared-memory machines as clusters that are interconnected with an off-the-shelf network. To create a shared-memory programming environment across the clusters, it is possible to use a virtual shared-memory software layer. Because of the low latency and high bandwidth of the interconnect available within each cluster, there are clear advantages in making the clusters as large as possible. The critical question then becomes whether the latency and bandwidth of the top-level network and the software system are sufficient to support the communication demands generated by the clusters.To explore these questions, we have built an aggressive kernel implementation of a virtual shared-memory system using SGI multiprocessors and 100Mbyte/sec HIPPI interconnects. The system obtains speedups on 32 processors (four nodes, eight processors per node plus additional reserved protocol processors) that range from 6.9 on the communication-intensive FFT program to 21.6 on Ocean (both from the SPLASH 2 suite). In general, clustering is effective in reducing internode miss rates, but as the cluster size increases, increases in the remote latency, mostly due to increased TLB synchronization cost, offset the advantages. For communication-intensive applications, such as FFT, the overhead of sending out network requests, the limited network bandwidth, and the long network latency prevent the achievement of good performance. Overall, this approach still appears promising, but our results indicate that large low latency networks may be needed to make cluster-based virtual shared-memory machines broadly useful as large-scale shared-memory multiprocessors.

References

  1. 1.Anant Agarwal, R. Bianchini, D. Chaiken, K. Johnson, D Kranz, J. Kubiatowicz, Beng-Hong Lira, K. Mackenzie, and D. Yeung. The MIT Alewife Machine: Architecture and Performance, In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 2-13, June 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Brian Bershad and Matthew J. Zekauskas. Midway: Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors, Carnegie Mellon University Technical Report No. CMU-CS 91-170, September 1991.]]Google ScholarGoogle Scholar
  3. 3.J.B. Carter. Design of the Munin Distributed Shared Memory System, Journal of Parallel and Distributed Computing, 29(2):219-27, September 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, and W. Zwaenepoel. Software versus Hardware Shared-memory Implementation: a Case Study, In Proceedings of the 21st Annual International Symposium on Computer Architecture, pp. 106-17, April 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Rohit Chandra, K. Gharachorloo, V. Soundararajan, and A. Gupta. Performance Evaluation of Hybrid Hardware and Software Distributed Shared Memory Protocols, In Proceedings of International Conference on Supercomputing '94, pp. 274-288. July 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Jeffery Chase, F. Amador, E. Lazowska, H. Levy, and R. Littlefield. The Amber System: Parallel Programming on a Network of Multiprocessors, in Proceedings of the Twelfth A CM Symposium on Operating System Principles, pp. 147-158, December 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.D.R. Cheriton, H. Goosen and P. Boyle. Multi-level Shared Caching Techniques for Scalability in VMP-MC, In Proceedings of the 16th International Symposium on Computer Architecture, pp. 16-24, May 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.M. Dubois, J. C. Wang, L. A. Barroso, K. L. Lee, and Y. Chen. Delayed Consistency and its Effect on the Miss Rate of Parallel Programs, Proceedings of SuperComputing '95, pp. 197-206, November 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Andrew Erlichson, Basem Nayfeh, Jaswinder P. Singh and Kunle Olukotun. The Benefits of Clustering in Shared Address Space Multiprocessors: An Applications Driven Investigation, Proceedings of SuperComputing '95, Dec. I995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Ewing Lusk. Portable Programs for Parallel Processors, Holt, Rinehart, and Winston, New York, 1987]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.K. Gharachofioo, Dan Lenoski, James Laudon, P. Gibbons, Anoop Gupta, and John Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors, In Proceedings of the 17th International Symposium on Computer Architecture, pp. 15-26, May 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Chris Holt and Jaswinder Pal Singh. Hierarchical N-Body Methods on Shared Address Space Multiprocessors, In Proceedings of the Seventh SIAM International Conference on Parallel Processing for Scientific Computing, pp. 313-18, February 1995.]]Google ScholarGoogle Scholar
  13. 13.Kirk Johnson, M. F. Kaashoek and D. Wallach. CRL: Highperformance All-software Distributed Shared Memory, In Fifteenth A C Symposium on Operating Systems Principles, pp. 213-28, December 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.Magnus Karlsson and Per Stenstrom. Performance Evaluation of a Cluster-Based Muluprocessor Built from ATM Switches and Bus- Based Multiprocessor Servers, In Proceedings of the Second International Symposium on High-Performance Computer Architecture, pp. 4-13, February 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Peter Keleher. Lazy Release Consistency for Distributed Shared Memory, PhD Thesis, Rice University, Houston, January 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.Pete Keleher, Alan L. Cox, and Willy Zwaenepoel. Lazy Release Consistency for Software Distributed Shared Memory, In Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 13-21, May 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.P. Keleher, Alan Cox, S. Dwarkadas and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems, In Proceedings of USENIX Winter 1994 Conference, pp. 115-32, January 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.Jeff Kuskin, David Ofelt, Mark Heinnch, John Heinlein, Richard Simoni, K, Gharachofioo, J. Chapin, David Nakahira, Joel Baxter, Mark Horowitz, Anoop Gupta, Mendel Rosenblum and John Hennessy, The Stanford FLASH Multiprocessor. in Proceedings of the 21st international Symposium on Computer Architecture, pp. 18-21, April 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.W. Leler. System-level Parallel Programming Based on Linda, In Proceedings of the Third North American Transputer Users Group, pp. 175-9, April 1990.]]Google ScholarGoogle Scholar
  20. 20.Kai Li and Paul Hudak. Memory Coherence in Shared Virtual Memory Systems. ACM Transactions on Computer Systems, 7(4):321-359, November 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.Ron Minnich. Mether-NFS: A Modified NFS which supports Virtual Shared Memory, In Proceedings of Symposium on Experiences with Distributed and Multiprocessor Systems IV, pp. 89-107, September 1993.]]Google ScholarGoogle Scholar
  22. 22.Bryan S. Rosenburg. Low-Synchronization Translation Lookaside Buffer Consistency in Large-Scale Shared- Memory Multiprocessors, In Proceedings of the Twelfth A CM Symposium on Operating System Principles, pp. 147-158, December 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.Dan Scales and Monica Lam. The Design and Evaluation of a Shared Object System for Distributed Memory Machines, In Proceedings of I st Symposium on Operation Systems Design and Implementation, pp. 101~ 14, November 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.Michael Y. Thompson, J. M. Barton, T. Jermoluk, and J. Wagner. Translation Lookaside Buffer Synchronization in a Multiprocssor System, In Proceeding of USENlX Association Winter Conference, pp. 297-302, February 1988.]]Google ScholarGoogle Scholar
  25. 25.Steven Cameron Woo, Jaswinder Pal Singh, and John L. Hennessy~ The Performance Advantages of Integrating Block Data Transfer in Cache-Coherent Multiprocessors, In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), pp. 219-229, October 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.Steven Cameron Woo, Jaswinder Pal Singh, and John L. Hennessy. The Performance Advantages of Integrating Block Data Transfer in Cache-Coherent Multiprocessors, Stanford University Technical Report No. CSL-TR-93-593, December 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.Steven Cameron Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations, In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 24-36, june 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.Donald Yeung, John Kubiatowicz, and Anant Agarwal. MGS: A Multi-Grain Shared Memory System, in Proceedings of the 23rd Annual International Symposium on Computer Architecture, pp. 44-55, April 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SoftFLASH: analyzing the performance of clustered distributed virtual shared memory

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
          October 1996
          290 pages
          ISBN:0897917677
          DOI:10.1145/237090

          Copyright © 1996 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 September 1996

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          ASPLOS VII Paper Acceptance Rate25of109submissions,23%Overall Acceptance Rate535of2,713submissions,20%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader