Abstract
It has been observed by many researchers that systolic arrays are very suitable for certain high-speed computations. Using a formal methodology, we present a design for a single simple programmable linear systolic array capable of solving large numbers of problems drawn from a variety of applications. The methodology is applicable to problems solvable by sequential algorithms that can be specified as nested for-loops of arbitrary depth. The algorithms of this form that can be computed on the array presented in this paper include 25 algorithms dealing with signal and image processing, algebraic computations, matrix arithmetic, pattern matching, database operations, sorting, and transitive closure. Assuming bounded I/O, for 18 of those algorithms the time and storage complexities are optimal, and therefore no improvement can be expected by using dedicated special-purpose linear systolic arrays designed for individual algorithms. We also describe another design which, using a sufficient large local memory and allowing data to be preloaded and unloaded, has an optimal processor/time product.
Similar content being viewed by others
References
Annaratone, M.A., Arnould, E., Gross, T., Kung, H.T., Lam, M., Menzilcioglu, I., and Webb, J.A. 1987. The WARP computer: Architecture, implementation, and performance.IEEE Trans. Comp., C-36, 12 (Dec.), 1523–1538.
Banerjee, U., Chen, S.C., Kuck, D.J., and Towle, R.A. 1979. Time and parallel processor bounds for FORTRAN like loops.IEEE Trans. Comp., C-28, 9 (Sept.), 660–670.
Chen, M.C. 1988. The generation of a class of multipliers: Synthesizing highly parallel algorithms in VLSI.IEEE Trans. Comp., C-37, 3 (Mar.), 329–338.
Foster, M.J., and Kung H.T. 1980. The design of special-purpose VLSI chips.IEEE Comp., 13, 1 (Jan.), 26–40.
Guibas, L.J., Kung, H.T., and Thompson, CD. 1979. Direct VLSI implementation of combinatorial algorithms. InConf. Proc.—CALTECH Conf. on VLSI (Jan.), pp. 509–525.
Heuft, R., and Little, W. 1982. Improved time and parallel processor bounds for Fortran-like loops.IEEE Trans. Comp., C-31, 1 (Jan.), 78–81.
Hwang, K., and Cheng, Y.H. 1982. Partitioned matrix algorithms for VLSI arithmetic systems.IEEE Trans. Comp., C-31, 12 (Dec.), 1215–1224.
Kuhn, R.H. 1980. Transforming algorithms for single-stage and VLSI architectures. InConf. Proc.—The Workshop on Interconnection Networks for Parallel and Distributed Processing, IEEE Computer Soc. Press, pp. 11–19.
Kung, H.T. 1981. Use of VLSI in algebraic computation: Some suggestions. InConf. Proc.—ACM Symp, on Symbolic and Algebraic Computation, pp. 218–222.
Kung, H.T. 1982. Why systolic architectures?IEEE Comp., 15, 1 (Jan.), 37–46.
Kung, H.T. 1984. Systolic algorithms for the CMU WARP processor. InConf. Proc.—The Seventh Internat. Conf. on Pattern Recognition (July), pp. 570–577.
Kung, H.T., and Lam, M.S. 1984. Wafer-scale integration and two-level pipelined implementations of systolic arrays,J. Parallel and Distributed Computing, 1: 32–63.
Kung, H.T., and Lehman, L. 1980. Systolic (VLSI) arrays for relational database operations. InConf. Proc.- ACM SIGMOD, pp. 105–116.
Kung, H.T., and Leiserson, C.E. 1980. Algorithms for VLSI processor arrays. InIntroduction to VLSI Systems, Chap. 8.3 (C. Mead and L. Conway, eds.) Addison-Wesley, Reading, Mass.
Kung, S.Y. 1984. On supercomputing with systolic/wavefront array processors.Proc. IEEE, 72, 7 (July), 867–884.
Kung, S.Y. 1988.VLSI Array Processors. Prentice Hall, Englewood Cliffs, N.J.
Lamport, L. 1974. The parallel execution of do loops.CACM, 17, 2 (Feb.), 83–93.
Lee, P.-Z. 1989. Mapping algorithms on regular parallel architectures, Ph.D. diss., New York Univ., New York.
Lee, P.-Z., and Kedem, Z.M. 1988. Synthesizing linear-array algorithms from nested for loop algorithms.The Special Issue on Parallel and Distributed Algorithms, IEEE Trans. Comp., C-37, 12 (Dec.), 1578–1598. (Preliminary version also available as NYU Comp. Sci. TR-355, Mar. 1988.)
Lee, P.-Z., and Kedem, Z.M. 1990. Mapping nested loop algorithms into multi-dimensional systolic arrays.IEEE Trans. on Parallel and Distributed Systems, 1, 1 (Jan.), 64–76.
Lee, P.-Z., Wu, J., Yang, A., Yip, K., Chu, C.W., and Liang, L.W. 1989. SYSDES: A systolic array automation design system. Presented inThe Fourth SIAM Conf. on Parallel Processing for Scientific Computing (Dec.) 11-13.
Li, G., and Wah, B. 1985. The design of optimal systolic arrays.IEEE Trans. Comp., C-34, 1 (Jan.), 66–77.
Moldovan, D.I. 1983. On the design of algorithms for VLSI systolic arrays.Proc. IEEE, 71, 1 (Jan.), 113–120.
Moldovan, D.I., and Fortes, J.A. 1986. Partitioning and mapping algorithms into fixed size systolic arrays.IEEE Trans. Comp., C-35, 1 (Jan.), 1–12.
Omtzigt, E.T.L. 1988. SYSTARS: A CAD tool for the synthesis and analysis of VLSI systolic/wavefront arrays. InConf. Proc.—Internat. Conf. on Systolic Arrays, (San Diego, Calif., May), pp. 383–391.
Quinton, P. 1984. Automatic synthesis of systolic arrays from uniform recurrent equations. InConf. Proc.—11th Annual Symp. Comput. Architecture, (Ann Arbor, Mich., June 5–7), IEEE Computer Soc. Press, pp. 208–214.
Ramakrishnan, I.V., and Varman, P. 1984. Modular matrix multiplication on a linear array.IEEE Trans. Comp., C-33, 11 (Nov.), 952–958.
Ramakrishnan, I.V., Fussell, D., and Silberschatz, A. 1986. Mapping homogeneous graphs on linear arrays.IEEE Trans. Comp., C-35, 3 (Mar.), 198–209.
Yang, C.B., and Lee, R.C.T. 1984. Systolic algorithms for the LCS problem. InConf. Proc.—Internat. Comput. Symp., (Taipei, Taiwan, R.O.C.), pp. 895–901.
Author information
Authors and Affiliations
Additional information
An earlier version of this paper was presented at Supercomputing '88.
This work was partially supported by ONR under the contract N00014-85-K-0046 and by NSF under Grant Number CCR-8906949.
Rights and permissions
About this article
Cite this article
Lee, P., Kedem, Z.M. On high-speed computing with a programmable linear array. J Supercomput 4, 223–249 (1990). https://doi.org/10.1007/BF00127833
Issue Date:
DOI: https://doi.org/10.1007/BF00127833