A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP

Zimmermann, Karl-Heinz

doi:10.1023/A:1007944932429

A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP

Published: 01 September 1997

Volume 17, pages 21–41, (1997)
Cite this article

Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Karl-Heinz Zimmermann¹

77 Accesses
9 Citations
Explore all metrics

Abstract

Various methods for the synthesis of systolic arrays from signal and image processing algorithms have been developed in the past few years. In this paper, we propose a technique for the partitioning problem, the problem to synthesize systolic arrays whose size does not match the problem size. Our technique generalizes most of the known lattice-based approaches to the partitioning problem and combines the multiprojection method for the synthesis of systolic arrays with the locally sequential-globally parallel (LSGP) and locally parallel-globally sequential (LPGS) partitioning schemes. Starting from (1) a k-dimensional large-size systolic array obtained from a system of n-dimensional uniform recurrences by a space-time transformation and (2) an arbitrary lattice in k-space inducing a partitioning of the array into subarrays, a small-size systolic array with a scalar-valued system clock is constructed via the LSGP or LPGS paradigm. In particular, the allocation function for the small-size array can be written in closed form and the timing function is obtained from timing functions for the subdomains, the set of operations performed by the subarrays, by simple greedy algorithms. In this way, the problem of finding optimal timing functions can in various cases be reduced to finding optimal timing functions for the subdomains. For problems of large size, these greedy algorithms seem to be preferable when compared with existing integer or non-convex programming formulations for finding (sub-)optimal timing functions. We also provide some new results, a necessary and sufficient condition for the existence of counter data flow, a formal relationship between partitionings of processor space and index space of the uniform recurrences in terms of counter data flow, and the structural equivalence between the lattice-based LSGP and LPGS schemes applied to the partitioning of index and processor space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Formal method for the synthesis of optimal topologies of computing systems based on the projective description of graphs

Article 26 March 2022

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

GPU Architecture

References

H.T. Kung and C.E. Leiserson, “Systolic arrays (for VLSI),” in Sparse Matrix Proc., Soc. Ind. App. Math., Duff et al. (Eds.), pp. 256–282, 1979.
C. Mead and L. Conway, Introduction to VLSI Systems, Addison-Wesley, Reading, MA, 1980.
Google Scholar
L. Guibas, H.T. Kung, and C.D. Thompson, “Direct VLSI implementation of combinatorial algorithms,” Proc. Conf. on Very Large Scale Integration: Architecture, Design and Fabrication, pp. 509–525, 1979.
H.T. Kung, “Why systolic architectures?,” Computer, Vol. 15, pp. 37–46, 1982.
Article Google Scholar
S.Y. Kung, VLSI Array Processors, Prentice Hall, Englewood Cliffs, NJ, 1987.
Google Scholar
R.H. Kuhn, “Transforming algorithms for single-stage and VLSI architectures,” Proc. Workshop Interconnection Networks Parallel Distributed Processing, IEEE CH1560-2, pp. 11–19, 1980.
D.I. Moldovan, “On the analysis and synthesis of VLSI algorithms,” IEEE Transaction on Computers, Vol. C-31, No. 11, pp. 1121–1126, 1982.
Article Google Scholar
P. Quinton, “Automatic synthesis of systolic arrays from uniform recurrent equations,” IEEE 11th Int. Sym. on Computer Architecture, pp. 208–214, 1984.
W.L. Miranker and A. Winkler, “Spacetime representations of computational structures,” Computing, Vol. 32, pp. 93–114, 1984.
Article MathSciNet MATH Google Scholar
P.R. Cappello and K. Steiglitz, “Unifying VLSI designs with linear tranformations on space time,” Adv. Comput. Res., Vol. 2, pp. 23–65, 1984.
Google Scholar
S.K. Rao, “Regular iterative algorithms and their implementations on processor arrays,” PhD thesis, Stanford University, Stanford, CA, 1985.
Google Scholar
Y. Wong and J.-M. Delosme, “Optimal systolic implementation of n-dimensionale recurrences,” IEEE Proc. ICCD, pp. 618–621, 1985.
Y. Wong and J.-M. Delosme, “Optimal systolic implementation of n-dimensionale recurrences,” Techn. Report, Computer Engineering, Yale University, Yale, CT, 1985.
Google Scholar
P. Lee and Z.M. Kedem, “Synthesizing linear array algorithms from nested for loop algorithms,” IEEE Transaction on Computers, Vol. C-37, No. 12, pp. 1578–1597, 1988.
MathSciNet Google Scholar
R.M. Karp, R.E. Miller, and S. Winograd, “The organization of computations for uniform recurrence equations,” Journal of the ACM, Vol. 13, No. 3, pp. 563–590, July 1967.
Article MathSciNet Google Scholar
L. Lamport, “The parallel execution of DO loops,” Commun. ACM, pp. 83–93, 1974.
K.-H. Zimmermann, “Linear mappings of n-dimensional recurrences onto k-dimensional systolic arrays,” J. VLSI Signal Processing, Vol. 12, No. 2, pp. 187–202, 1996.
Article Google Scholar
K.-H. Zimmermann and W. Achtziger, “Finding space-time transformations for uniform recurrences via branching parametric linear programming,” J. VLSI Signal Processing, Vol. 15, No. 3, pp. 259–274, 1997.
Article Google Scholar
J.A.B. Fortes and F. Parisi-Presicce, “Optimal linear schedules for the parallel execution of algorithms,” Int. Conf. on Parallel Processing, pp. 322–328, 1984.
W. Shang and J.A.B. Fortes, “Time optimal linear schedules for algorithms with uniform dependencies,” IEEE Transaction on Computers, Vol. 40, No. 6, pp. 723–742, 1991.
Article MathSciNet Google Scholar
P. Feautrier, “Some efficient solutions to the affine scheduling problem. I. One-dimensional time,” Int. J. of Parallel Programming, Vol. 21, No. 5, pp. 313–347, 1992.
Article MathSciNet MATH Google Scholar
A. Darte, L. Khachiyan, and Y. Robert, “Linear scheduling is close to optimality,” Int. Conf. on Application Specific Array Processors, IEEE Computer Soc. Press, pp. 37–46, 1992.
P. Feautrier, “Some efficient solutions to the affine scheduling problem. Part II. multidimensional time,” Int. J. of Parallel Programming, Vol. 21, No. 6, pp. 389–420, 1992.
Article MathSciNet MATH Google Scholar
Y. Wong and J.-M. Delosme, “Optimization of computation time for systolic arrays,” IEEE Transactions on Computers, Vol. 41, No. 2, pp. 159–177, 1992.
Article Google Scholar
K.-H. Zimmermann and W. Achtziger, “On time optimal implementation of uniform recurrences onto array processors via quadratic programming,” J. VLSI Signal Processing (to appear).
P. Clauss, C. Mongenet, and G.-R. Perrin, “Calculus of space-optimal mappings of systolic algorithms on processor arrays,” J. VLSI Signal Processing, Vol. 4, pp. 27–36, 1992.
Article Google Scholar
X. Zhong, S. Rajopadhye, and I. Wong, “Systematic generation of linear allocation functions in systolic array design,” J. VLSI Signal Processing, Vol. 4, pp. 279–293, 1992.
Article Google Scholar
D.I. Moldovan and J.A.B. Fortes, “Partitioning and mapping algorithms into fixed size systolic arrays,” IEEE Transaction on Computers, Vol. 35, No. 1, pp. 1–12, 1986.
Article MATH Google Scholar
K. Hwang and Y.H. Chung, “Partitioned matrix algorithms for VLSI arithmetic systems,” IEEE Transactions on Computers, Vol. C-31, No. 12, pp. 1214–1224, 1982.
Google Scholar
L. Johnson, “Optimal partitioning schemes for wavefront/ systolic array processors,” Proc. MIT Conf. Advanced Research VLSI, 1982.
D. Heller, “Partitioning big matrices for small systolic arrays,” in VLSI and Modern Signal Processing, S.Y. Kung, H.J. Whitehouse, and T. Kailath (Eds.), Prentice Hall, Englewood Cliffs, NJ, 1985.
Google Scholar
K. Jainandunsing, “Optimal partitioning scheme for wavefront/ systolic array processors,” Proc. IEEE Symp. on Circuits and Systems, pp. 940–943, 1986.
H.W. Nelis and E.F. Deprettere, “Automatic design and partitioning of systolic/wavefront arrays for VLSI,” Circuits Systems Signal Processing, Vol. 7, No. 2, 1988.
K. Jainandunsing, “Parallel algorithms for solving systems of linear equations and their mapping on systolic arrays,” PhD Thesis, Delft Univ. of Technology, Delft, The Netherlands, 1989.
Google Scholar
J. Bu, E.F. Deprettere, and P. Dewilde, “A design methodology for fixed-size systolic arrays,” Int. Conf. on Application Specific Array Processors, IEEE Computer Soc. Press, pp. 591–600, 1990.
J. Bu, “Systematic design of regular VLSI processor arrays,” PhD Thesis, Delft University of Technology, Delft, The Netherlands, 1990.
Google Scholar
A. Darte and J.-M. Delosme, “Partitioning for array processors,” Techn. Report 90-23, Laboratoire de l'Informatique du Parallelisme, Ecole Normale Superieure De Lyon, France, Oct. 1990.
Google Scholar
J. Bu and E.F. Deprettere, “Processor clustering for the design of optimal fixed-size systolic arrays,” '91, pp. 402– 413, 1991.
V. van Dongen, “Mapping uniform recurrences onto small size arrays,” Proc. PARLE '91, LNCS, pp. 190–208, 1991.
A. Darte, “Regular partitioning for synthesizing fixed-size systolic arrays,” Integration, Vol. 12, pp. 293–304, 1991.
Google Scholar
X. Zhong and S. Rajopadhye, “Quasi-linear allocation functions for efficient array design,” J. VLSI Signal Processing, Vol. 4, pp. 97–110, 1992.
Article Google Scholar
K.-H. Zimmermann, “An optimal partitioning method for parallel algorithms: LSGP,” in Genetic, Chaotic and Parallel Programming: The Sixth Generation, B. Soucek and IRIS Group (Eds.), Wiley & Sons, New York, pp. 233–266, 1992.
Google Scholar
H. Serre, A Course in Arithmetic, Springer, New York, 1973.
Book MATH Google Scholar
K.-H. Zimmermann, T.-C. Lee, and S.-Y. Kung, “On partitioning and fault tolerance issues for neural array processors,” J. VLSI Signal Processing, Vol. 6, pp. 85–94, 1993.
Article Google Scholar
J. Teich and L. Thiele, “A transformative approach to the partitioning of processor arrays,” Proc. ASAP '92, pp. 4–20, 1992.
Google Scholar
A. Suarez, J.M. Llaberia and A. Fernandez, “Scheduling partitions in systolic algorithms,” Proc. ASAP '92, pp. 619–633, 1992.
Google Scholar
P. Kuchibhotla and B.D. Rao, “Efficient scheduling methods for partitioned systolic algorithms,” Proc. ASAP '92, pp. 649–663, 1992.
Google Scholar
The Transputer Databook, Inmos, Bristol, 1989.
K. Hwang and F. Briggs, Computer Architecture and Parallel Processing, MacGraw Hill, 1984.
F. Irigoin and R. Triolet, “Supernode partitioning,” Proc. SIG-PLAN, San Diego, 1988, pp. 319–329.
J.-P. Sheu and T.-H. Tai, “Partitioning and mapping nested loops on multiprocessor systems,” IEEE Transactions on Parallel and Distributed Systems, Vol. 2, pp. 430–439, 1991.
Article Google Scholar
W. Shang, M.T. O'Keefe, and J.A.B. Fortes, “On loop transformations for generalized cycle shrinking,” IEEE Transaction on Parallel and Distributed Systems, Vol. 5, No. 2, pp. 193–204, 1994.
Article Google Scholar
J.W.S. Cassels, An Introduction to the Geometry of Numbers, Springer, Berlin, 1959.
Book MATH Google Scholar
A. Schrijver, Theory of Linear and Integer Programming, Wiley-Interscience, New York, 1986.
MATH Google Scholar
P. Quinton, “The systematic design of systolic arrays,” in Automata Networks in Computer Science, Princeton Univ. Press, Princeton, pp. 229–260, 1987.
Google Scholar
V. van Dongen, “Quasi-regular arrays: Definition and design methodology,” in Systolic Arrays Processors, J. McCanny, J. McWhirter, and E. Schwartzlander (Eds.), Int. Conf. on Systolic Arrays, Englewood Cliffs, NJ, Prentice Hall, pp. 126–135, 1989.
B. McDonald, Finite Rings With Identity, Marcel Dekker, New York, 1974.
MATH Google Scholar
D.C. Kozen, The Design and Analysis of Algorithms, Springer, 1992.
K.-H. Zimmermann and W. Achtziger, “On time optimal piecewise linear schedules for LSGP-and LPGS-partitionings of array processors via quadratic programming,” Preprint, Univ. of Erlangen, 1996.
W.P. Burleson, “The partitioning problem on VLSI arrays: I/O and local memory complexity,” Proc. IEEE ICASSP, pp. 1217– 1220, 1991.

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Technical University Hamburg-Harburg, 21071, Hamburg, Germany
Karl-Heinz Zimmermann

Authors

Karl-Heinz Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zimmermann, KH. A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 17, 21–41 (1997). https://doi.org/10.1023/A:1007944932429

Download citation

Published: 01 September 1997
Issue Date: September 1997
DOI: https://doi.org/10.1023/A:1007944932429

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP

Abstract

Access this article

Similar content being viewed by others

Formal method for the synthesis of optimal topologies of computing systems based on the projective description of graphs

Parallelizing the dual revised simplex method

GPU Architecture

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP

Abstract

Access this article

Similar content being viewed by others

Formal method for the synthesis of optimal topologies of computing systems based on the projective description of graphs

Parallelizing the dual revised simplex method

GPU Architecture

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation