Abstract
The SB-PRAM is a shared-memory parallel computer that has been designed according to the PRAM model from theoretical computer science. The SB-PRAM realizes a concurrent-read, concurrent-write PRAM where each processor can access the global memory in unit time. This article describes the programming environment of the SB-PRAM that enables a programmer to develop efficient and portable programs without dealing with architectural details of the machine. In particular, we discuss compiler and operating system issues and show that the runtime functions of the P4 environment and several parallel data structures can be implemented very efficiently by using special features of the SB-PRAM. In contrast to other parallel machines, the synchronization of processors and the management of concurrent accesses to the global memory only require a few machine instructions independent of the number of processors participating in the operation. This efficient implementation of the runtime system is the basis for good performance of many challenging applications.
Similar content being viewed by others
REFERENCES
J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens, Portable Programs for Parallel Processors, Holt, Rinehart, and Winston, New York (1987).
S. Hiranandani, K. Kennedy, and C.-W. Tseng, Compiler-Support for Machine-Independent Parallel Programming in Fortran-D, Technical Report Rice COMP TR91–149, Rice University (March 1991).
H. P. F. Forum, High Performance Fortran Language Specification, Sci. Progr. 2(1): 1–170 (1993).
K. Ikudome, G. Fox, A. Kolawa, and J. Flower, An Automatic and Symbolic Parallelization System for Distributed Memory Parallel Computers, Proc. Fifth Distributed Memory Computing Conf., pp. 1105–1114 (1990).
P. Banerjee, J. Chandy, M. Cupta, E. Hodge, J. Holmes, A. Lain, D. Palermo, S. Ramaswamy, and E. Su, The Paradigm Compiler for Distributed-Memory Multicomputers, IEEE Computer vn 28 (10):37–47 (1995).
J. Li and M. Chen, Index Domain Alignment: Minimizing Costs of Cross-Referencing between Distributed Arrays, Third Symposium on the Frontiers of Massively Parallel Computation, pp. 424–433 (1990).
G. Alverson and D. Notkin, Program Structuring for Effective Parallel Portability, IEEE Trans. Parallel and Distribut. Syst. 4(9):1041–1059 (1993).
J. Rothnie, Overview of the KSR1 Computer System, Technical Report, Kendell Square Research (1992).
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, The DASH Prototype: Logic Overhead and Performance, IEEE Trans. Parallel and Distribut. Syst. 4(1):41–61 (1993).
J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy, The Stanford FLASH Multiprocessor, Proc. 21st Int'l. Symp. Computer Architecture (April 1994).
W. Crowther, J. Goodhue, R. Gurwitz, R. Rettberg, and R. Thomas, The Butterfly Parallel Processor, IEEE Comput. Architect. Tech. Committee Newsletter (1985).
BBN Advanced Computers Inc., TC2000 Product Background, Technical Report (1989).
A. Garcia, D. Foster, and R. Freitas, The Advanced Computing Environment Multiprocessor Workstation, Technical Report, IBM Research Division (1989).
T. Sterling, D. Savarese, P. Merkey, and K. Olson, An Empirical Evaluation of the Convex SPP-1000 Hierarchical Shared Memory System, Proc. PACT '95 (1995).
J. Singh, W.-D. Weber, and A. Gupta, SPLASH: Stanford Parallel Applications for Shared Memory, Technical Report CSL-TR-91–469, Stanford University (1991).
L. Soule, Parallel Logic Simulation: An Evaluation of Centralized-Time and Distributed-Time Algorithms, Ph.D. Thesis, Stanford University (1992). Also published as technical report: CSL-TR-92–527.
E. C. Corp, The Multimax Family of Computer Systems, Technical Report, Rice University (1988).
T. Lovett and S. Thakkar, The Symmetry Multiprocessor System, Int'l. Conf. on Parallel Processing (1988).
P. Bach, M. Braun, A. Formella, J. Friedrich, T. Grün, and C. Lichtenau, Building the 4-SB-PRAM Prototype, Proc. 30th Hawaii Int'l. Conf. Syst. Sci. ( January 1997).
T. Walle, Das Netzwerk der SB-PRAM, Ph.D. Thesis, University of the Saarland, Germany (1997).
J. Keller, W. Paul, and D. Scheerer, Realization of PRAMs: Processor Design, Proc. WDAG, Eight Int'l. Workshop on Distributed Algorithms, Springer-Verlag, LNCS, No.857, pp. 17–27 (1994).
C. Engelmann and J. Keller, Simulation Based Comparison of Hash Functions for Emulated Shared Memory, Proc. PARLE '93, Parallel Architectures and Languages Europe, Springer-Verlag, LNCS No.694, pp. 1–11 (1993).
A. Formella, J. Keller, and T. Walle, HPP: A High Performance PRAM, Proc. of Euro-Par, Springer-Verlag, LNCS No.1124, pp. 425–434 (August 1996).
A. V. Krishnamoorthy and D. A. B. Miller, Scaling Optoelectronic-VLSI Circuits into the 21st Century: A Technology Roadmap, IEEE Journal of Selected Topics in Quantum Electronics 2(1):55–76 (April 1996).
C. W. Kessler and H. Seidl, The Fork95 Parallel Programming Language: Design, Implementation, Application, Int'l. J. Parallel Programming 25(1):17–50 (February 1997).
R. Butler and E. Lusk, User's Guide to the P4 Parallel Programming System, Technical Report ANL-92/17, Argonne National Laboratory (1992).
R. Butler and E. Lusk, Monitors, Messages, and Clusters: The P4 Parallel Programming System, J. Parallel Comput. 20(4):547–564 (1994).
J. Röhrig, Implementierung der P4–Laufzeitbibiothek auf der SB-PRAM, Master's Thesis, University of the Saarland, Germany (1996) [In German].
J. Wilson, Operating System Data Structures for Shared-Memory MIMD Machines with Fletch-and-Add, Ph.D. Thesis, New York University (1988).
L. Soule and A. Gupta, An Evaluation of the Chandy-Misra-Bryant Algorithm for Digital Logic Simulation, ACM Trans. Modeling and Computer Simulation 1(4):308–347 (1991).
J. Keller, T. Rauber, and B. Rederlechner, Conservative Circuit Simulation on Shared-Memory Multiprocessors, Proc. Tenth Workshop on Parallel and Distributed Simulation, Philadelphia (May 1996).
J. Rose, Parallel Global Routing for Standard Cells, IEEE Trans. Computer Aided Design 9(10):1085–1095 (1990).
X. Zhang, K. He, and G. Butchee, Execution Behavior Analysis and Performance Improvement in Shared-Memory Architectures, Proc. Fifth IEEE Symp. on Parallel and Distributed Processing, IEEE Computer Society (1993).
X. Zhang, K. He, and G. Butchee, Performance Bottleneck Identification and Application Program Improvement on Network-based Shared-memory Architectures, Technical Report, High Performance Computing and Software Lab, University of Texas at San Antonio (1993).
T. Rauber, G. Rünger, and C. Scholtes, Shared-memory Implementation of an Irregular Particle Simulation Method, Proc. EuroPar'96, Springer LNCS (1996).
P. Hanrahan, D. Salzman, and L. Aupperle, A Rapid Hierarchical Radiosity Algorithm, Computer Graphics (1991).
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, The SPLASH-2 Programs: Characterization and Methodological Considerations, Proc. 22nd Ann. Int'l Symp. on Computer Architecture, pp. 24–36 (1995).
A. Podehl, T. Rauber, and G. Rünger, Scalability and Granularity Issues of the Hierarchical Radiosity Method, Proc. EuroPar '96, Springer LNCS (1996).
A. Formella, Ray Tracing Complex Scenes: Parallel or Sequential? Proc. Seventh IASTED/ISMM Int'l Conf. Parallel and Distrib. Comput. Syst., Acta Press, pp. 89–92 (October 1995).
S. Fortune and J. Wyllie, Parallelism in Random Access Machines, Proc. Tenth ACM Symp. Theory of Computing, pp. 114–118 (1978).
C. Papadimitriou and M. Yannakakis, Towards an Architecture-Independent Analysis of Parallel Algorithms, Proc. 20th ACM Symp. Theory of Computing, pp. 510–513 (1988).
A. Aggarwal, A. Chandra, and M. Snir, Communication Complexity of PRAMs, Theor. Comput. Sci. 71:3–28 (1990).
P. Gibbons, A More Practical PRAM MODEL, Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 158–168 (1989).
B. Alpern and L. Carter, Towards a Model for Portable Parallel Performance: Exposing the Memory Hierarchy, Portability and Performance for Parallel Processing, John Wiley, pp. 21–41 (1994).
L. Valiant, A Bridging Model for Parallel Computation, Comm. ACM 33(8):103–111 (1990).
W. McColl, An Architecture Independent Programming Model for Scalable Parallel Computing, Portability and Performance for Parallel Processing, John Wiley, pp. 43–69 (1994).
D. Culler, R. Karp, A. Sahay, K. Schauser, E. Santos, R. Subramonian, and T. von Eicken, LogP: Towards a Realistic Model of Parallel Computation, Fourth Symp. on Principles and Practice of Parallel Prog. 28(4):1–12 (1993).
A. G. Ranade, S. N. Bhatt, and S. L. Johnson, The Fluent Abstract Machine, Proc. Fifth MIT Conf. on Adv. Res. VLSI, MIT Press, pp. 71–93, Cambridge, Massachusetts (1988).
Silicon Graphics Inc. Origin Technology, http://www.sgi.com/Products/hardware/servers/technology/index.html (March 1997).
G. Almasi and A. Gottlieb, Highly Parallel Computing, Second Edition, Benjamin/Cummings Publishing Company, Inc., 390 Bridge Parkway, Redwood City, California 94065 (1994).
B. Smith, A Pipelined, Shared Resource MIMD Computer, Proc. Int'l Conf. Parallel Processing (1978).
A. Agarwal, R. Bianchini, D. Chaiken, K. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie, and D. Yeung, The MIT Alewife Machine: Architecture and Performance, Int'l. Symp. Computer Architecture (1995).
A. Gottlieb, R. Grishman, C. Kruskal, K. McAuliffe, L. Rudolph, and M. Snir, The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer, IEEE Trans. Comput. 32(2):175–189 (1983).
A. Gottlieb, B. Lubachevsky, and L. Rudolph, Basic Techniques for Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors, ACM Trans. Progr. Lang. Syst. (April 1983).
G. Pfister, W. Brantley, D. George, S. Harvey, W. Kleinfelder, K. McAuliffe, E. Melton, V. Norton, and J. Weiss, The IBm Research Parallel Processor Prototype (RP3): Introduction and Architecture, Proc. Int'l. Conf. Parallel Processing, IEEE, pp. 764–771 (1985).
R. Alverson, D. Calahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith, The Tera Computer System, Int'l. Conf. Supercomputing (June 1990).
A. Formella and J. Keller, Generalized Fisheye Views of Graphs, Proc. Graph Drawing, Springer Verlag LNCS 1027, pp. 242–253 (December 1995).
Rights and permissions
About this article
Cite this article
Grün, T., Rauber, T. & Röhrig, J. Support for Efficient Programming on the SB-PRAM. International Journal of Parallel Programming 26, 209–240 (1998). https://doi.org/10.1023/A:1018749028569
Issue Date:
DOI: https://doi.org/10.1023/A:1018749028569