Abstract
In a SIMD or VLIW machine, conceptual synchronizations are accomplished by using a static code schedule that does not require run-time synchronization. The lack of run-time synchronization overhead makes these machines very effective for fine-grain parallelism, but they cannot execute parallel code structures as general as those executed by MIMD architectures, and this limits their utility.
In this paper we present a timing analysis that allows a compiler for a MIMD machine to eliminate a large fraction of the run-time synchronization by making efficient use of static code scheduling. Although these techniques can be adapted to be applied to most MIMD machines, this paper centers on the analysis and scheduling for barrier MIMD machines. Barrier MIMDs are asynchronous multiple instruction stream/multiple data stream architectures capable of parallel execution of variable execution-time instructions and arbitrary control flow (e.g., while loops and calls). However, they also incorporate a special hardware barrier synchronization mechanism that facilitates static scheduling by providing a mechanism which the compiler can use to enforce precise timing constraints. In other words, the compiler tracks relative timing between processors and uses static code scheduling until the timing imprecision becomes too large, at which point the compiler simply inserts a barrier to reduce that timing imprecision to zero (or a small constant).
This paper describes new scheduling and barrier placement algorithms for barrier MIMDs that are based loosely on the list scheduling approach employed for VLIWs [Ellis 1985]. In addition, the experimental results from scheduling thousands of synthetic benchmark programs for a parameterized barrier MIMD machine are presented.
Similar content being viewed by others
References
Aho, A.V., Hopcroft, J.E., and Ullman, J.D. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading Mass.
Aho, A.V., Sethi, R., and Ullman, J.D. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, Mass.
Alexander, W.G., and Wortman, D.B. 1975. Static and dynamic characteristics of XPL programs. IEEE Comp. (Nov.), 41–46.
Bronson, E.C., Casavant, T.L., and Jamieson, L.H. 1990. Experimental application-driven architecture analysis of a SIMD/MIMD parallel processing system. IEEE Trans. Parallel and Distributed Systems, 1, 2 (Apr.), 195–205.
Callahan II, C.D. 1987. A global approach to the detection of parallelism. Ph.D. diss., Rice Univ., Houston, Tex.
Colwell, R.P., Nix, R.P., O'Donnell, J.J., Papworth, D.B., and Rodman, P.K. 1988. A VLIW architecture for a trace scheduling compiler. IEEE Trans. Comps., C-37, 8 (Aug.), 967–979.
Dietz, H.G., and Schwederski, T. 1988. Extending static synchronization beyond VLIW. Tech. rept. TR-EE 88–25, School of Electrical Engineering, Purdue Univ., West Lafayette, Ind.
Dietz, H.G., O'Keefe, M.T., and Zaafrani, A. 1990. An introduction to static scheduling for MIMD architectures. In Preliminary Proc., 3rd Workshop on Programming Languages and Compilers far Parallel Computing (Irvine, Calif., Aug.), 26 pp. Also in Advances in Languages and Compilers For Parallel Processing, A. Nicolau et al. (ed.), MIT Press, Cambridge, Mass., 1991.
Dietz, H.G., Schwederski, T., O'Keefe, M.T., and Zaafrani, A. 1989. Extending static synchronization beyond VLIW. In Proc., Supercomputing '89 (Reno, Nev., Nov.) pp. 416–425.
Dietz, H.G., Siegel, HJ., Cohen, W.E., O'Keefe, M.T., et al. 1989. A compiler-oriented architecture: The CARP machine. In Fourth SIAM Conf. on Parallel Processing for Scientific Computing (Chicago, Dec.).
Dreyfus, S.E. 1969. An appraisal of some shortest-path algorithms. ORSA, 17: 395–412.
Ellis, J.R. 1985. Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, Mass.
Fishburn, P.C. 1985. Interval Orders and Interval Graphs: A Study of Partially Ordered Sets. John Wiley & Sons, New York.
Fisher, J.A. 1984. The VLIW machine: A multiprocessor for compiling scientific code. IEEE Comp. (July), 45–53.
Fox, B.L. 1973. Calculating kth shortest paths. INFOR, 11: 66–70.
Hoffman, W., and Pavley, R. 1959. A method for the solution of the Nth best path problem. JACM, 6: 506–514.
Hu, T.C. 1982. Combinatorial Algorithms. Addison-Wesley, Reading, Mass.
Kasahara, H., and Narita, S. 1984. Practical multiprocessor scheduling algorithms for efficient parallel processing. IEEE Trans. Comps., C-33, 11 (Nov.), 1023–1029.
O'Keefe, M.T., and Dietz, H.G. 1990a. Hardware barrier synchronization: Dynamic barrier MIMD (DBM). In 1990 Internat. Conf. on Parallel Processing, vol. 1 (St. Charles, Ill., Aug.), pp. 43–45.
O'Keefe, M.T., and Dietz, H.G. 1990b. Hardware barrier synchronization: Static barrier MIMD (SBM). In 1990 Internat. Conf. on Parallel Processing, vol. 1 (St. Charles, Ill., Aug.), pp. 35–42.
Schwederski, T., Nation, W.G., Siegel, H.J., and Meyer, D.G. 1987. The implementation of the PASM prototype control hierarchy. In Proc., Second Internat. Conf. on Supercomputing, vol. 1, pp. 418–427.
Shaffer, P.L. 1989. Minimization of interprocessor synchronization in multiprocessors with shared and private memory. In Proc., 1989 Internat. Conf. on Parallel Processing, vol. 3 (St. Charles, Ill., Aug.), pp. 138–142.
Zaafrani, A. 1990. Static scheduling of barrier MIMD architecture. M.S. diss., School of Electrical Engineering, Purdue Univ., West Lafayette, Ind.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dietz, H.G., Zaafrani, A. & O'Keefe, M.T. Static scheduling for barrier MIMD architectures. J Supercomput 5, 263–289 (1992). https://doi.org/10.1007/BF00127949
Issue Date:
DOI: https://doi.org/10.1007/BF00127949