Skip to main content
Log in

Support for Efficient Programming on the SB-PRAM

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The SB-PRAM is a shared-memory parallel computer that has been designed according to the PRAM model from theoretical computer science. The SB-PRAM realizes a concurrent-read, concurrent-write PRAM where each processor can access the global memory in unit time. This article describes the programming environment of the SB-PRAM that enables a programmer to develop efficient and portable programs without dealing with architectural details of the machine. In particular, we discuss compiler and operating system issues and show that the runtime functions of the P4 environment and several parallel data structures can be implemented very efficiently by using special features of the SB-PRAM. In contrast to other parallel machines, the synchronization of processors and the management of concurrent accesses to the global memory only require a few machine instructions independent of the number of processors participating in the operation. This efficient implementation of the runtime system is the basis for good performance of many challenging applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

REFERENCES

  1. J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, and R. Stevens, Portable Programs for Parallel Processors, Holt, Rinehart, and Winston, New York (1987).

    Google Scholar 

  2. S. Hiranandani, K. Kennedy, and C.-W. Tseng, Compiler-Support for Machine-Independent Parallel Programming in Fortran-D, Technical Report Rice COMP TR91–149, Rice University (March 1991).

  3. H. P. F. Forum, High Performance Fortran Language Specification, Sci. Progr. 2(1): 1–170 (1993).

    Google Scholar 

  4. K. Ikudome, G. Fox, A. Kolawa, and J. Flower, An Automatic and Symbolic Parallelization System for Distributed Memory Parallel Computers, Proc. Fifth Distributed Memory Computing Conf., pp. 1105–1114 (1990).

  5. P. Banerjee, J. Chandy, M. Cupta, E. Hodge, J. Holmes, A. Lain, D. Palermo, S. Ramaswamy, and E. Su, The Paradigm Compiler for Distributed-Memory Multicomputers, IEEE Computer vn 28 (10):37–47 (1995).

    Google Scholar 

  6. J. Li and M. Chen, Index Domain Alignment: Minimizing Costs of Cross-Referencing between Distributed Arrays, Third Symposium on the Frontiers of Massively Parallel Computation, pp. 424–433 (1990).

  7. G. Alverson and D. Notkin, Program Structuring for Effective Parallel Portability, IEEE Trans. Parallel and Distribut. Syst. 4(9):1041–1059 (1993).

    Google Scholar 

  8. J. Rothnie, Overview of the KSR1 Computer System, Technical Report, Kendell Square Research (1992).

  9. D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, The DASH Prototype: Logic Overhead and Performance, IEEE Trans. Parallel and Distribut. Syst. 4(1):41–61 (1993).

    Google Scholar 

  10. J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy, The Stanford FLASH Multiprocessor, Proc. 21st Int'l. Symp. Computer Architecture (April 1994).

  11. W. Crowther, J. Goodhue, R. Gurwitz, R. Rettberg, and R. Thomas, The Butterfly Parallel Processor, IEEE Comput. Architect. Tech. Committee Newsletter (1985).

  12. BBN Advanced Computers Inc., TC2000 Product Background, Technical Report (1989).

  13. A. Garcia, D. Foster, and R. Freitas, The Advanced Computing Environment Multiprocessor Workstation, Technical Report, IBM Research Division (1989).

  14. T. Sterling, D. Savarese, P. Merkey, and K. Olson, An Empirical Evaluation of the Convex SPP-1000 Hierarchical Shared Memory System, Proc. PACT '95 (1995).

  15. J. Singh, W.-D. Weber, and A. Gupta, SPLASH: Stanford Parallel Applications for Shared Memory, Technical Report CSL-TR-91–469, Stanford University (1991).

  16. L. Soule, Parallel Logic Simulation: An Evaluation of Centralized-Time and Distributed-Time Algorithms, Ph.D. Thesis, Stanford University (1992). Also published as technical report: CSL-TR-92–527.

  17. E. C. Corp, The Multimax Family of Computer Systems, Technical Report, Rice University (1988).

  18. T. Lovett and S. Thakkar, The Symmetry Multiprocessor System, Int'l. Conf. on Parallel Processing (1988).

  19. P. Bach, M. Braun, A. Formella, J. Friedrich, T. Grün, and C. Lichtenau, Building the 4-SB-PRAM Prototype, Proc. 30th Hawaii Int'l. Conf. Syst. Sci. ( January 1997).

  20. T. Walle, Das Netzwerk der SB-PRAM, Ph.D. Thesis, University of the Saarland, Germany (1997).

    Google Scholar 

  21. J. Keller, W. Paul, and D. Scheerer, Realization of PRAMs: Processor Design, Proc. WDAG, Eight Int'l. Workshop on Distributed Algorithms, Springer-Verlag, LNCS, No.857, pp. 17–27 (1994).

  22. C. Engelmann and J. Keller, Simulation Based Comparison of Hash Functions for Emulated Shared Memory, Proc. PARLE '93, Parallel Architectures and Languages Europe, Springer-Verlag, LNCS No.694, pp. 1–11 (1993).

  23. A. Formella, J. Keller, and T. Walle, HPP: A High Performance PRAM, Proc. of Euro-Par, Springer-Verlag, LNCS No.1124, pp. 425–434 (August 1996).

  24. A. V. Krishnamoorthy and D. A. B. Miller, Scaling Optoelectronic-VLSI Circuits into the 21st Century: A Technology Roadmap, IEEE Journal of Selected Topics in Quantum Electronics 2(1):55–76 (April 1996).

    Google Scholar 

  25. C. W. Kessler and H. Seidl, The Fork95 Parallel Programming Language: Design, Implementation, Application, Int'l. J. Parallel Programming 25(1):17–50 (February 1997).

    Google Scholar 

  26. R. Butler and E. Lusk, User's Guide to the P4 Parallel Programming System, Technical Report ANL-92/17, Argonne National Laboratory (1992).

  27. R. Butler and E. Lusk, Monitors, Messages, and Clusters: The P4 Parallel Programming System, J. Parallel Comput. 20(4):547–564 (1994).

    Google Scholar 

  28. J. Röhrig, Implementierung der P4–Laufzeitbibiothek auf der SB-PRAM, Master's Thesis, University of the Saarland, Germany (1996) [In German].

    Google Scholar 

  29. J. Wilson, Operating System Data Structures for Shared-Memory MIMD Machines with Fletch-and-Add, Ph.D. Thesis, New York University (1988).

  30. L. Soule and A. Gupta, An Evaluation of the Chandy-Misra-Bryant Algorithm for Digital Logic Simulation, ACM Trans. Modeling and Computer Simulation 1(4):308–347 (1991).

    Google Scholar 

  31. J. Keller, T. Rauber, and B. Rederlechner, Conservative Circuit Simulation on Shared-Memory Multiprocessors, Proc. Tenth Workshop on Parallel and Distributed Simulation, Philadelphia (May 1996).

  32. J. Rose, Parallel Global Routing for Standard Cells, IEEE Trans. Computer Aided Design 9(10):1085–1095 (1990).

    Google Scholar 

  33. X. Zhang, K. He, and G. Butchee, Execution Behavior Analysis and Performance Improvement in Shared-Memory Architectures, Proc. Fifth IEEE Symp. on Parallel and Distributed Processing, IEEE Computer Society (1993).

  34. X. Zhang, K. He, and G. Butchee, Performance Bottleneck Identification and Application Program Improvement on Network-based Shared-memory Architectures, Technical Report, High Performance Computing and Software Lab, University of Texas at San Antonio (1993).

  35. T. Rauber, G. Rünger, and C. Scholtes, Shared-memory Implementation of an Irregular Particle Simulation Method, Proc. EuroPar'96, Springer LNCS (1996).

  36. P. Hanrahan, D. Salzman, and L. Aupperle, A Rapid Hierarchical Radiosity Algorithm, Computer Graphics (1991).

  37. S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, The SPLASH-2 Programs: Characterization and Methodological Considerations, Proc. 22nd Ann. Int'l Symp. on Computer Architecture, pp. 24–36 (1995).

  38. A. Podehl, T. Rauber, and G. Rünger, Scalability and Granularity Issues of the Hierarchical Radiosity Method, Proc. EuroPar '96, Springer LNCS (1996).

  39. A. Formella, Ray Tracing Complex Scenes: Parallel or Sequential? Proc. Seventh IASTED/ISMM Int'l Conf. Parallel and Distrib. Comput. Syst., Acta Press, pp. 89–92 (October 1995).

  40. S. Fortune and J. Wyllie, Parallelism in Random Access Machines, Proc. Tenth ACM Symp. Theory of Computing, pp. 114–118 (1978).

  41. C. Papadimitriou and M. Yannakakis, Towards an Architecture-Independent Analysis of Parallel Algorithms, Proc. 20th ACM Symp. Theory of Computing, pp. 510–513 (1988).

  42. A. Aggarwal, A. Chandra, and M. Snir, Communication Complexity of PRAMs, Theor. Comput. Sci. 71:3–28 (1990).

    Google Scholar 

  43. P. Gibbons, A More Practical PRAM MODEL, Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 158–168 (1989).

  44. B. Alpern and L. Carter, Towards a Model for Portable Parallel Performance: Exposing the Memory Hierarchy, Portability and Performance for Parallel Processing, John Wiley, pp. 21–41 (1994).

  45. L. Valiant, A Bridging Model for Parallel Computation, Comm. ACM 33(8):103–111 (1990).

    Google Scholar 

  46. W. McColl, An Architecture Independent Programming Model for Scalable Parallel Computing, Portability and Performance for Parallel Processing, John Wiley, pp. 43–69 (1994).

  47. D. Culler, R. Karp, A. Sahay, K. Schauser, E. Santos, R. Subramonian, and T. von Eicken, LogP: Towards a Realistic Model of Parallel Computation, Fourth Symp. on Principles and Practice of Parallel Prog. 28(4):1–12 (1993).

    Google Scholar 

  48. A. G. Ranade, S. N. Bhatt, and S. L. Johnson, The Fluent Abstract Machine, Proc. Fifth MIT Conf. on Adv. Res. VLSI, MIT Press, pp. 71–93, Cambridge, Massachusetts (1988).

    Google Scholar 

  49. Silicon Graphics Inc. Origin Technology, http://www.sgi.com/Products/hardware/servers/technology/index.html (March 1997).

  50. G. Almasi and A. Gottlieb, Highly Parallel Computing, Second Edition, Benjamin/Cummings Publishing Company, Inc., 390 Bridge Parkway, Redwood City, California 94065 (1994).

  51. B. Smith, A Pipelined, Shared Resource MIMD Computer, Proc. Int'l Conf. Parallel Processing (1978).

  52. A. Agarwal, R. Bianchini, D. Chaiken, K. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie, and D. Yeung, The MIT Alewife Machine: Architecture and Performance, Int'l. Symp. Computer Architecture (1995).

  53. A. Gottlieb, R. Grishman, C. Kruskal, K. McAuliffe, L. Rudolph, and M. Snir, The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer, IEEE Trans. Comput. 32(2):175–189 (1983).

    Google Scholar 

  54. A. Gottlieb, B. Lubachevsky, and L. Rudolph, Basic Techniques for Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors, ACM Trans. Progr. Lang. Syst. (April 1983).

  55. G. Pfister, W. Brantley, D. George, S. Harvey, W. Kleinfelder, K. McAuliffe, E. Melton, V. Norton, and J. Weiss, The IBm Research Parallel Processor Prototype (RP3): Introduction and Architecture, Proc. Int'l. Conf. Parallel Processing, IEEE, pp. 764–771 (1985).

  56. R. Alverson, D. Calahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith, The Tera Computer System, Int'l. Conf. Supercomputing (June 1990).

  57. A. Formella and J. Keller, Generalized Fisheye Views of Graphs, Proc. Graph Drawing, Springer Verlag LNCS 1027, pp. 242–253 (December 1995).

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grün, T., Rauber, T. & Röhrig, J. Support for Efficient Programming on the SB-PRAM. International Journal of Parallel Programming 26, 209–240 (1998). https://doi.org/10.1023/A:1018749028569

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018749028569

Navigation