Skip to main content
Log in

Abstract

We describe a class-specific linear pseudosystolic array, withK processing elements, suitable for partitioned execution of matrix algorithms. This array achieves high efficiency, exploits pipelining within cells in a simple manner, has off-cells communication rate lower than computation rate, a small storage inside each cell (whose size is independent of the size of problems), and external storage. This array has been derived from the application of the multimesh graph (MMG) method to a large class of matrix algorithms.

Processing elements (cells) use the decoupled access/execute model of computation, which requires two programs in each cell: one controlling the execution of operations and the other the data transfers. All storage modules in the array are accessed as FIFO queues, without the need for addressing mechanisms. We describe the proposed instruction set, which includes single-instruction loops with no overhead, and block-loops with just one extra instruction. Moreover, cells can nest up to three loops with no added overhead. These features are needed for mapping algorithms with the MMG method.

Mapping onto this array is illustrated using the LU-decomposition algorithm, and results obtained with other algorithms are also given. Estimates of performance indicate that it is possible to achieve over 85% efficiency, with low requirements in communication bandwidth and storage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. H.T. Kung, “Why systolic architectures?”IEEE Computer, 15(1), 1982, pp. 37–46.

    Article  Google Scholar 

  2. J.H. Moreno and T. Lang, “Matrix computations on systolictype meshes: An introduction to the multimesh graph (MMG) method,”IEEE Computer, 23(4), 1990, pp. 32–51.

    Article  Google Scholar 

  3. M. Annaratone, E. Arnould, T. Gross, H.T. Kung, M. Lam, O. Mezilcioglu, and J.A. Webb, “The Warp computer: Architecture, implementation and performance,”IEEE Transactions on Computers, C-36(12), 1987, pp. 1523–1538.

    Article  Google Scholar 

  4. D.E. Foulser and R. Schreiber, “The Saxpy Matrix-1: A general purpose systolic computer,”IEEE Computer, 20(7), 1987, pp. 35–44.

    Article  Google Scholar 

  5. B.L. Drake, F.T. Luk, J.M. Speiser, and J.J. Symanski, “SLAPP: A systolic linear algebra parallel processor,”IEEE Computer, 20(7), 1987, pp. 45–50.

    Article  Google Scholar 

  6. J.J. Symanski and K. Bromley, “Video analysis transputer array (VATA) processor,” InSPIE Real-Time Signal Processing XI, San Diego, CA, 1988.

  7. J.G. Nash, K.W. Przytula, and S. Hansen, “The systolic/cellular system for signal processing,”IEEE Computer, 20(7), 1987, pp. 96–97.

    Article  Google Scholar 

  8. J.H. Moreno and T. Lang, “Arrays for partitioned matrix algorithms: Tradeoffs between cell storage and cell bandwidth,” InSPIE Real-Time Signal Processing XI, San Diego, CA, 1988, pp. 156–169.

  9. J.H. Moreno and T. Lang, “A graph-based approach to map matrix algorithms onto local-access processor arrays,” InInternational Conference on Application-Specific Array Processors, Princeton, NJ, 1990, pp. 641–652.

  10. J.H. Moreno and T. Lang, “A graph-based approach to map matrix algorithms onto application-specific multiprocessor arrays,” InXI International Conference Chilean Computer Science Society, Santiago, Chile, 1990, pp. 263–274.

  11. P.S. Tseng, M. Lam, and H.T. Kung, “The domain parallel computation model on Warp,” InSPIE Real-Time Signal Processing XI, San Diego, CA, USA, 1988, pp. 130–137.

  12. J.J. Navarro, J.M. Llaberia, and M. Valero, “Partitioning: An essential step in mapping algorithms into systolic array processors,”IEEE Computer, 20(7), 1987, pp. 77–89.

    Article  Google Scholar 

  13. Jaime H. Moreno,Matrix computations on mesh arrays, Ph.D. thesis, Computer Science Department, University of California Los Angeles, 1989.

    Google Scholar 

  14. J.E. Smith, “Decoupled access/execute computer architectures,”ACM Transactions on Computer Systems, 2(4), 1984, pp. 289–308.

    Article  MATH  Google Scholar 

  15. M.E. Figueroa and J.H. Moreno, “A decoupled access/execute processor for matrix algorithms: architecture and programming,” Technical Report, Department of Electrical Engineering, University of Concepción, Concepción, Chile, 1991.

    Google Scholar 

  16. W.M. Gentleman, “Least squares computations by Givens transformations without square roots,”Journal Institute Mathematics Applications, 12, 1973, pp. 329–336.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This research has been supported in part by Universidad de Concepción (Grant DI-20.92.21, “Linear array for matrix algorithms”) and by NSF (Grant MIP-8813340, “Composite operations using on-line arithmetic in application-specific parallel architectures”).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moreno, J.H., Figueroa, M.E. & Lang, T. Linear pseudosystolic array for partitioned matrix algorithms. J VLSI Sign Process Syst Sign Image Video Technol 3, 201–214 (1991). https://doi.org/10.1007/BF00925831

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00925831

Keywords

Navigation