Abstract
We describe a class-specific linear pseudosystolic array, withK processing elements, suitable for partitioned execution of matrix algorithms. This array achieves high efficiency, exploits pipelining within cells in a simple manner, has off-cells communication rate lower than computation rate, a small storage inside each cell (whose size is independent of the size of problems), and external storage. This array has been derived from the application of the multimesh graph (MMG) method to a large class of matrix algorithms.
Processing elements (cells) use the decoupled access/execute model of computation, which requires two programs in each cell: one controlling the execution of operations and the other the data transfers. All storage modules in the array are accessed as FIFO queues, without the need for addressing mechanisms. We describe the proposed instruction set, which includes single-instruction loops with no overhead, and block-loops with just one extra instruction. Moreover, cells can nest up to three loops with no added overhead. These features are needed for mapping algorithms with the MMG method.
Mapping onto this array is illustrated using the LU-decomposition algorithm, and results obtained with other algorithms are also given. Estimates of performance indicate that it is possible to achieve over 85% efficiency, with low requirements in communication bandwidth and storage.
Similar content being viewed by others
References
H.T. Kung, “Why systolic architectures?”IEEE Computer, 15(1), 1982, pp. 37–46.
J.H. Moreno and T. Lang, “Matrix computations on systolictype meshes: An introduction to the multimesh graph (MMG) method,”IEEE Computer, 23(4), 1990, pp. 32–51.
M. Annaratone, E. Arnould, T. Gross, H.T. Kung, M. Lam, O. Mezilcioglu, and J.A. Webb, “The Warp computer: Architecture, implementation and performance,”IEEE Transactions on Computers, C-36(12), 1987, pp. 1523–1538.
D.E. Foulser and R. Schreiber, “The Saxpy Matrix-1: A general purpose systolic computer,”IEEE Computer, 20(7), 1987, pp. 35–44.
B.L. Drake, F.T. Luk, J.M. Speiser, and J.J. Symanski, “SLAPP: A systolic linear algebra parallel processor,”IEEE Computer, 20(7), 1987, pp. 45–50.
J.J. Symanski and K. Bromley, “Video analysis transputer array (VATA) processor,” InSPIE Real-Time Signal Processing XI, San Diego, CA, 1988.
J.G. Nash, K.W. Przytula, and S. Hansen, “The systolic/cellular system for signal processing,”IEEE Computer, 20(7), 1987, pp. 96–97.
J.H. Moreno and T. Lang, “Arrays for partitioned matrix algorithms: Tradeoffs between cell storage and cell bandwidth,” InSPIE Real-Time Signal Processing XI, San Diego, CA, 1988, pp. 156–169.
J.H. Moreno and T. Lang, “A graph-based approach to map matrix algorithms onto local-access processor arrays,” InInternational Conference on Application-Specific Array Processors, Princeton, NJ, 1990, pp. 641–652.
J.H. Moreno and T. Lang, “A graph-based approach to map matrix algorithms onto application-specific multiprocessor arrays,” InXI International Conference Chilean Computer Science Society, Santiago, Chile, 1990, pp. 263–274.
P.S. Tseng, M. Lam, and H.T. Kung, “The domain parallel computation model on Warp,” InSPIE Real-Time Signal Processing XI, San Diego, CA, USA, 1988, pp. 130–137.
J.J. Navarro, J.M. Llaberia, and M. Valero, “Partitioning: An essential step in mapping algorithms into systolic array processors,”IEEE Computer, 20(7), 1987, pp. 77–89.
Jaime H. Moreno,Matrix computations on mesh arrays, Ph.D. thesis, Computer Science Department, University of California Los Angeles, 1989.
J.E. Smith, “Decoupled access/execute computer architectures,”ACM Transactions on Computer Systems, 2(4), 1984, pp. 289–308.
M.E. Figueroa and J.H. Moreno, “A decoupled access/execute processor for matrix algorithms: architecture and programming,” Technical Report, Department of Electrical Engineering, University of Concepción, Concepción, Chile, 1991.
W.M. Gentleman, “Least squares computations by Givens transformations without square roots,”Journal Institute Mathematics Applications, 12, 1973, pp. 329–336.
Author information
Authors and Affiliations
Additional information
This research has been supported in part by Universidad de Concepción (Grant DI-20.92.21, “Linear array for matrix algorithms”) and by NSF (Grant MIP-8813340, “Composite operations using on-line arithmetic in application-specific parallel architectures”).
Rights and permissions
About this article
Cite this article
Moreno, J.H., Figueroa, M.E. & Lang, T. Linear pseudosystolic array for partitioned matrix algorithms. J VLSI Sign Process Syst Sign Image Video Technol 3, 201–214 (1991). https://doi.org/10.1007/BF00925831
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF00925831