Abstract
We show how novel active memory system research and system networking trends can be combined to realize hardware distributed shared memory on clusters of industry-standard workstations. Our active memory controller extends the cache coherence protocol to support transparent use of address re-mapping techniques that dramatically improve single-node performance, and also contains the necessary functionality for building a hardware DSM machine. Simultaneously, commodity network technology is becoming more tightly-integrated with the memory controller. We call our design of active memory commodity nodes interconnected by a next-generation network active memory clusters. We present a detailed design of the AMC architecture, focusing on the active memory controller and the network characteristics necessary to support AMC. We show simulation results for a range of parallel applications, showing that AMC performance is comparable to that of custom hardware DSM systems and far exceeds that of the fastest software DSM solutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bilas, A., Liao, C., Singh, J.P.: Using Network Interface Support to Avoid Asynchronous Protocol Processing in Shared Virtual Memory Systems. In Proceedings of the 26th International Symposium on Computer Architecture, May 1999.
Carter, J. B., et al.: Design of the Munin Distributed Shared Memory System. Journal of Parallel and Distributed Computing, 29(2):219–227, September 1995.
Carter, J. B., et al.: Impulse: Building a Smarter Memory Controller. In Proceedings of the Fifth International Symposium on High Performance Computer Architecture January 1999.
Gokhale, M., Holmes, B., Iobst, K.: Processing in Memory: the Terasys Massively Parallel PIM Array. Computer, 28(3):23–31, April 1995.
Hall, M., et al.: Mapping Irregular Applications to DIVA, A PIM-based Data-Intensive Architecture. Supercomputing, Portland, OR, Nov. 1999.
Heinrich, M., Speight, E.: Active Memory Clusters: Efficient Multiprocessing on Next-Generation Servers. Technical Report CSL-TR-2001-1014, Computer Systems Lab, Cornell University, August, 2001.
InfiniBand Architecture Specification, Volume 1.0, Release 1.0. InfiniBand Trade Association, October 24, 2000.
Keleher, P., et al.: TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proceedings of the Winter 1994 USENIX Conference, pages 115–132, January 1994.
Kang, Y., et al.: FlexRAM: Toward an Advanced Intelligent Memory System. International Conference on Computer Design, October 1999.
Kim, D., Chaudhuri, M., Heinrich, M.: Leveraging Cache Coherence in Active Memory Systems. Technical Report CSL-TR-2001-1018, Computer Systems Laboratory, Cornell University, November 2001.
Kuskin, J., et al.: The Stanford FLASH Multiprocessor. In Proceedings of the 21st International Symposium on Computer Architecture, pages 302–313, April 1994.
Laudon, J., Lenoski, D.: The SGI Origin: A ccNUMA Highly Scalable Server. In Proceedings of the 24th International Symposium on Computer Architecture, pages 241–251, June 1997.
Lenoski, D., et al.: The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63–79, March 1992.
Li, K., Hudak, P.: Memory Coherence in Shared Virtual Memory Systems. In ACM Transactions on Computer Systems, 7(4):321–359, November 1989.
Manohar, R., Heinrich, M.: A Case for Asynchronous Active Memories. In ISCA 2000 Solving the Memory Wall Problem Workshop, June 2000.
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, Version 1.0, 1994.
Nowatzyk, A., et al.: The S3.mp Scalable Shared Memory Multiprocessor. In Proceedings of the 24th International Conference on Parallel Processing, 1995.
Oskin, M., Chong, F. T., Sherwood, T.: Active Pages: A Computation Model for Intelligent Memory. In Proceedings of the 25th International Symposium on Computer Architecture, 1998.
Saulsbury, A., Pong, F., Nowatzyk, A.: Missing the Memory Wall: The Case for Processor/Memory Integration. In Proceedings of the 23rd International Symposium on Computer Architecture, pages 90–101, May 1996.
Scales, D. J., Gharachorloo, K., Thekkath, C. A.: Shasta: A Low-Overhead Software-Only Approach for Supporting Fine-Grain Shared Memory. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 174–185, October 1996.
Soundararajan, R., et al.: Flexible Use of Memory for Replication/Migration in Cache-Coherent DSM Multiprocessors. In Proceedings of the 25th International Symposium on Computer Architecture, pages 342–355, June 1998.
Speight, E., Bennett, J. K.: Brazos: A Third Generation DSM System. In Proceedings of the First Usenix Windows NT Symposium, August 1997.
Torrellas, J., Yang, L., Nguyen, A.-T.: Toward a Cost-Effective DSM Organization that Exploits Processor-Memory Integration In Proceedings of the 6th International Symposium on High-Performance Computer Architecture, January 2000.
Woo, S. C., et al.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 24–36, June 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Heinrich, M., Speight, E., Chaudhuri, M. (2002). Active Memory Clusters: Efficient Multiprocessing on Commodity Clusters. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds) High Performance Computing. ISHPC 2002. Lecture Notes in Computer Science, vol 2327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47847-7_9
Download citation
DOI: https://doi.org/10.1007/3-540-47847-7_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43674-4
Online ISBN: 978-3-540-47847-8
eBook Packages: Springer Book Archive