Skip to main content
Log in

A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP

  • Published:
Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Abstract

Various methods for the synthesis of systolic arrays from signal and image processing algorithms have been developed in the past few years. In this paper, we propose a technique for the partitioning problem, the problem to synthesize systolic arrays whose size does not match the problem size. Our technique generalizes most of the known lattice-based approaches to the partitioning problem and combines the multiprojection method for the synthesis of systolic arrays with the locally sequential-globally parallel (LSGP) and locally parallel-globally sequential (LPGS) partitioning schemes. Starting from (1) a k-dimensional large-size systolic array obtained from a system of n-dimensional uniform recurrences by a space-time transformation and (2) an arbitrary lattice in k-space inducing a partitioning of the array into subarrays, a small-size systolic array with a scalar-valued system clock is constructed via the LSGP or LPGS paradigm. In particular, the allocation function for the small-size array can be written in closed form and the timing function is obtained from timing functions for the subdomains, the set of operations performed by the subarrays, by simple greedy algorithms. In this way, the problem of finding optimal timing functions can in various cases be reduced to finding optimal timing functions for the subdomains. For problems of large size, these greedy algorithms seem to be preferable when compared with existing integer or non-convex programming formulations for finding (sub-)optimal timing functions. We also provide some new results, a necessary and sufficient condition for the existence of counter data flow, a formal relationship between partitionings of processor space and index space of the uniform recurrences in terms of counter data flow, and the structural equivalence between the lattice-based LSGP and LPGS schemes applied to the partitioning of index and processor space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. H.T. Kung and C.E. Leiserson, “Systolic arrays (for VLSI),” in Sparse Matrix Proc., Soc. Ind. App. Math., Duff et al. (Eds.), pp. 256–282, 1979.

  2. C. Mead and L. Conway, Introduction to VLSI Systems, Addison-Wesley, Reading, MA, 1980.

    Google Scholar 

  3. L. Guibas, H.T. Kung, and C.D. Thompson, “Direct VLSI implementation of combinatorial algorithms,” Proc. Conf. on Very Large Scale Integration: Architecture, Design and Fabrication, pp. 509–525, 1979.

  4. H.T. Kung, “Why systolic architectures?,” Computer, Vol. 15, pp. 37–46, 1982.

    Article  Google Scholar 

  5. S.Y. Kung, VLSI Array Processors, Prentice Hall, Englewood Cliffs, NJ, 1987.

    Google Scholar 

  6. R.H. Kuhn, “Transforming algorithms for single-stage and VLSI architectures,” Proc. Workshop Interconnection Networks Parallel Distributed Processing, IEEE CH1560-2, pp. 11–19, 1980.

  7. D.I. Moldovan, “On the analysis and synthesis of VLSI algorithms,” IEEE Transaction on Computers, Vol. C-31, No. 11, pp. 1121–1126, 1982.

    Article  Google Scholar 

  8. P. Quinton, “Automatic synthesis of systolic arrays from uniform recurrent equations,” IEEE 11th Int. Sym. on Computer Architecture, pp. 208–214, 1984.

  9. W.L. Miranker and A. Winkler, “Spacetime representations of computational structures,” Computing, Vol. 32, pp. 93–114, 1984.

    Article  MathSciNet  MATH  Google Scholar 

  10. P.R. Cappello and K. Steiglitz, “Unifying VLSI designs with linear tranformations on space time,” Adv. Comput. Res., Vol. 2, pp. 23–65, 1984.

    Google Scholar 

  11. S.K. Rao, “Regular iterative algorithms and their implementations on processor arrays,” PhD thesis, Stanford University, Stanford, CA, 1985.

    Google Scholar 

  12. Y. Wong and J.-M. Delosme, “Optimal systolic implementation of n-dimensionale recurrences,” IEEE Proc. ICCD, pp. 618–621, 1985.

  13. Y. Wong and J.-M. Delosme, “Optimal systolic implementation of n-dimensionale recurrences,” Techn. Report, Computer Engineering, Yale University, Yale, CT, 1985.

    Google Scholar 

  14. P. Lee and Z.M. Kedem, “Synthesizing linear array algorithms from nested for loop algorithms,” IEEE Transaction on Computers, Vol. C-37, No. 12, pp. 1578–1597, 1988.

    MathSciNet  Google Scholar 

  15. R.M. Karp, R.E. Miller, and S. Winograd, “The organization of computations for uniform recurrence equations,” Journal of the ACM, Vol. 13, No. 3, pp. 563–590, July 1967.

    Article  MathSciNet  Google Scholar 

  16. L. Lamport, “The parallel execution of DO loops,” Commun. ACM, pp. 83–93, 1974.

  17. K.-H. Zimmermann, “Linear mappings of n-dimensional recurrences onto k-dimensional systolic arrays,” J. VLSI Signal Processing, Vol. 12, No. 2, pp. 187–202, 1996.

    Article  Google Scholar 

  18. K.-H. Zimmermann and W. Achtziger, “Finding space-time transformations for uniform recurrences via branching parametric linear programming,” J. VLSI Signal Processing, Vol. 15, No. 3, pp. 259–274, 1997.

    Article  Google Scholar 

  19. J.A.B. Fortes and F. Parisi-Presicce, “Optimal linear schedules for the parallel execution of algorithms,” Int. Conf. on Parallel Processing, pp. 322–328, 1984.

  20. W. Shang and J.A.B. Fortes, “Time optimal linear schedules for algorithms with uniform dependencies,” IEEE Transaction on Computers, Vol. 40, No. 6, pp. 723–742, 1991.

    Article  MathSciNet  Google Scholar 

  21. P. Feautrier, “Some efficient solutions to the affine scheduling problem. I. One-dimensional time,” Int. J. of Parallel Programming, Vol. 21, No. 5, pp. 313–347, 1992.

    Article  MathSciNet  MATH  Google Scholar 

  22. A. Darte, L. Khachiyan, and Y. Robert, “Linear scheduling is close to optimality,” Int. Conf. on Application Specific Array Processors, IEEE Computer Soc. Press, pp. 37–46, 1992.

  23. P. Feautrier, “Some efficient solutions to the affine scheduling problem. Part II. multidimensional time,” Int. J. of Parallel Programming, Vol. 21, No. 6, pp. 389–420, 1992.

    Article  MathSciNet  MATH  Google Scholar 

  24. Y. Wong and J.-M. Delosme, “Optimization of computation time for systolic arrays,” IEEE Transactions on Computers, Vol. 41, No. 2, pp. 159–177, 1992.

    Article  Google Scholar 

  25. K.-H. Zimmermann and W. Achtziger, “On time optimal implementation of uniform recurrences onto array processors via quadratic programming,” J. VLSI Signal Processing (to appear).

  26. P. Clauss, C. Mongenet, and G.-R. Perrin, “Calculus of space-optimal mappings of systolic algorithms on processor arrays,” J. VLSI Signal Processing, Vol. 4, pp. 27–36, 1992.

    Article  Google Scholar 

  27. X. Zhong, S. Rajopadhye, and I. Wong, “Systematic generation of linear allocation functions in systolic array design,” J. VLSI Signal Processing, Vol. 4, pp. 279–293, 1992.

    Article  Google Scholar 

  28. D.I. Moldovan and J.A.B. Fortes, “Partitioning and mapping algorithms into fixed size systolic arrays,” IEEE Transaction on Computers, Vol. 35, No. 1, pp. 1–12, 1986.

    Article  MATH  Google Scholar 

  29. K. Hwang and Y.H. Chung, “Partitioned matrix algorithms for VLSI arithmetic systems,” IEEE Transactions on Computers, Vol. C-31, No. 12, pp. 1214–1224, 1982.

    Google Scholar 

  30. L. Johnson, “Optimal partitioning schemes for wavefront/ systolic array processors,” Proc. MIT Conf. Advanced Research VLSI, 1982.

  31. D. Heller, “Partitioning big matrices for small systolic arrays,” in VLSI and Modern Signal Processing, S.Y. Kung, H.J. Whitehouse, and T. Kailath (Eds.), Prentice Hall, Englewood Cliffs, NJ, 1985.

    Google Scholar 

  32. K. Jainandunsing, “Optimal partitioning scheme for wavefront/ systolic array processors,” Proc. IEEE Symp. on Circuits and Systems, pp. 940–943, 1986.

  33. H.W. Nelis and E.F. Deprettere, “Automatic design and partitioning of systolic/wavefront arrays for VLSI,” Circuits Systems Signal Processing, Vol. 7, No. 2, 1988.

  34. K. Jainandunsing, “Parallel algorithms for solving systems of linear equations and their mapping on systolic arrays,” PhD Thesis, Delft Univ. of Technology, Delft, The Netherlands, 1989.

    Google Scholar 

  35. J. Bu, E.F. Deprettere, and P. Dewilde, “A design methodology for fixed-size systolic arrays,” Int. Conf. on Application Specific Array Processors, IEEE Computer Soc. Press, pp. 591–600, 1990.

  36. J. Bu, “Systematic design of regular VLSI processor arrays,” PhD Thesis, Delft University of Technology, Delft, The Netherlands, 1990.

    Google Scholar 

  37. A. Darte and J.-M. Delosme, “Partitioning for array processors,” Techn. Report 90-23, Laboratoire de l'Informatique du Parallelisme, Ecole Normale Superieure De Lyon, France, Oct. 1990.

    Google Scholar 

  38. J. Bu and E.F. Deprettere, “Processor clustering for the design of optimal fixed-size systolic arrays,” '91, pp. 402– 413, 1991.

  39. V. van Dongen, “Mapping uniform recurrences onto small size arrays,” Proc. PARLE '91, LNCS, pp. 190–208, 1991.

  40. A. Darte, “Regular partitioning for synthesizing fixed-size systolic arrays,” Integration, Vol. 12, pp. 293–304, 1991.

    Google Scholar 

  41. X. Zhong and S. Rajopadhye, “Quasi-linear allocation functions for efficient array design,” J. VLSI Signal Processing, Vol. 4, pp. 97–110, 1992.

    Article  Google Scholar 

  42. K.-H. Zimmermann, “An optimal partitioning method for parallel algorithms: LSGP,” in Genetic, Chaotic and Parallel Programming: The Sixth Generation, B. Soucek and IRIS Group (Eds.), Wiley & Sons, New York, pp. 233–266, 1992.

    Google Scholar 

  43. H. Serre, A Course in Arithmetic, Springer, New York, 1973.

    Book  MATH  Google Scholar 

  44. K.-H. Zimmermann, T.-C. Lee, and S.-Y. Kung, “On partitioning and fault tolerance issues for neural array processors,” J. VLSI Signal Processing, Vol. 6, pp. 85–94, 1993.

    Article  Google Scholar 

  45. J. Teich and L. Thiele, “A transformative approach to the partitioning of processor arrays,” Proc. ASAP '92, pp. 4–20, 1992.

    Google Scholar 

  46. A. Suarez, J.M. Llaberia and A. Fernandez, “Scheduling partitions in systolic algorithms,” Proc. ASAP '92, pp. 619–633, 1992.

    Google Scholar 

  47. P. Kuchibhotla and B.D. Rao, “Efficient scheduling methods for partitioned systolic algorithms,” Proc. ASAP '92, pp. 649–663, 1992.

    Google Scholar 

  48. The Transputer Databook, Inmos, Bristol, 1989.

  49. K. Hwang and F. Briggs, Computer Architecture and Parallel Processing, MacGraw Hill, 1984.

  50. F. Irigoin and R. Triolet, “Supernode partitioning,” Proc. SIG-PLAN, San Diego, 1988, pp. 319–329.

  51. J.-P. Sheu and T.-H. Tai, “Partitioning and mapping nested loops on multiprocessor systems,” IEEE Transactions on Parallel and Distributed Systems, Vol. 2, pp. 430–439, 1991.

    Article  Google Scholar 

  52. W. Shang, M.T. O'Keefe, and J.A.B. Fortes, “On loop transformations for generalized cycle shrinking,” IEEE Transaction on Parallel and Distributed Systems, Vol. 5, No. 2, pp. 193–204, 1994.

    Article  Google Scholar 

  53. J.W.S. Cassels, An Introduction to the Geometry of Numbers, Springer, Berlin, 1959.

    Book  MATH  Google Scholar 

  54. A. Schrijver, Theory of Linear and Integer Programming, Wiley-Interscience, New York, 1986.

    MATH  Google Scholar 

  55. P. Quinton, “The systematic design of systolic arrays,” in Automata Networks in Computer Science, Princeton Univ. Press, Princeton, pp. 229–260, 1987.

    Google Scholar 

  56. V. van Dongen, “Quasi-regular arrays: Definition and design methodology,” in Systolic Arrays Processors, J. McCanny, J. McWhirter, and E. Schwartzlander (Eds.), Int. Conf. on Systolic Arrays, Englewood Cliffs, NJ, Prentice Hall, pp. 126–135, 1989.

  57. B. McDonald, Finite Rings With Identity, Marcel Dekker, New York, 1974.

    MATH  Google Scholar 

  58. D.C. Kozen, The Design and Analysis of Algorithms, Springer, 1992.

  59. K.-H. Zimmermann and W. Achtziger, “On time optimal piecewise linear schedules for LSGP-and LPGS-partitionings of array processors via quadratic programming,” Preprint, Univ. of Erlangen, 1996.

  60. W.P. Burleson, “The partitioning problem on VLSI arrays: I/O and local memory complexity,” Proc. IEEE ICASSP, pp. 1217– 1220, 1991.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zimmermann, KH. A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 17, 21–41 (1997). https://doi.org/10.1023/A:1007944932429

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007944932429

Keywords

Navigation