Linear pseudosystolic array for partitioned matrix algorithms

Moreno, Jaime H.; Figueroa, Miguel E.; Lang, Tomas

doi:10.1007/BF00925831

Jaime H. Moreno¹,
Miguel E. Figueroa¹ &
Tomas Lang²

49 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

We describe a class-specific linear pseudosystolic array, withK processing elements, suitable for partitioned execution of matrix algorithms. This array achieves high efficiency, exploits pipelining within cells in a simple manner, has off-cells communication rate lower than computation rate, a small storage inside each cell (whose size is independent of the size of problems), and external storage. This array has been derived from the application of the multimesh graph (MMG) method to a large class of matrix algorithms.

Processing elements (cells) use the decoupled access/execute model of computation, which requires two programs in each cell: one controlling the execution of operations and the other the data transfers. All storage modules in the array are accessed as FIFO queues, without the need for addressing mechanisms. We describe the proposed instruction set, which includes single-instruction loops with no overhead, and block-loops with just one extra instruction. Moreover, cells can nest up to three loops with no added overhead. These features are needed for mapping algorithms with the MMG method.

Mapping onto this array is illustrated using the LU-decomposition algorithm, and results obtained with other algorithms are also given. Estimates of performance indicate that it is possible to achieve over 85% efficiency, with low requirements in communication bandwidth and storage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Simple Study of Pleasing Parallelism on Multicore Computers

The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures

Symbolic Mapping of Loop Programs onto Processor Arrays

Article 11 July 2014

References

H.T. Kung, “Why systolic architectures?”IEEE Computer, 15(1), 1982, pp. 37–46.
Article Google Scholar
J.H. Moreno and T. Lang, “Matrix computations on systolictype meshes: An introduction to the multimesh graph (MMG) method,”IEEE Computer, 23(4), 1990, pp. 32–51.
Article Google Scholar
M. Annaratone, E. Arnould, T. Gross, H.T. Kung, M. Lam, O. Mezilcioglu, and J.A. Webb, “The Warp computer: Architecture, implementation and performance,”IEEE Transactions on Computers, C-36(12), 1987, pp. 1523–1538.
Article Google Scholar
D.E. Foulser and R. Schreiber, “The Saxpy Matrix-1: A general purpose systolic computer,”IEEE Computer, 20(7), 1987, pp. 35–44.
Article Google Scholar
B.L. Drake, F.T. Luk, J.M. Speiser, and J.J. Symanski, “SLAPP: A systolic linear algebra parallel processor,”IEEE Computer, 20(7), 1987, pp. 45–50.
Article Google Scholar
J.J. Symanski and K. Bromley, “Video analysis transputer array (VATA) processor,” InSPIE Real-Time Signal Processing XI, San Diego, CA, 1988.
J.G. Nash, K.W. Przytula, and S. Hansen, “The systolic/cellular system for signal processing,”IEEE Computer, 20(7), 1987, pp. 96–97.
Article Google Scholar
J.H. Moreno and T. Lang, “Arrays for partitioned matrix algorithms: Tradeoffs between cell storage and cell bandwidth,” InSPIE Real-Time Signal Processing XI, San Diego, CA, 1988, pp. 156–169.
J.H. Moreno and T. Lang, “A graph-based approach to map matrix algorithms onto local-access processor arrays,” InInternational Conference on Application-Specific Array Processors, Princeton, NJ, 1990, pp. 641–652.
J.H. Moreno and T. Lang, “A graph-based approach to map matrix algorithms onto application-specific multiprocessor arrays,” InXI International Conference Chilean Computer Science Society, Santiago, Chile, 1990, pp. 263–274.
P.S. Tseng, M. Lam, and H.T. Kung, “The domain parallel computation model on Warp,” InSPIE Real-Time Signal Processing XI, San Diego, CA, USA, 1988, pp. 130–137.
J.J. Navarro, J.M. Llaberia, and M. Valero, “Partitioning: An essential step in mapping algorithms into systolic array processors,”IEEE Computer, 20(7), 1987, pp. 77–89.
Article Google Scholar
Jaime H. Moreno,Matrix computations on mesh arrays, Ph.D. thesis, Computer Science Department, University of California Los Angeles, 1989.
Google Scholar
J.E. Smith, “Decoupled access/execute computer architectures,”ACM Transactions on Computer Systems, 2(4), 1984, pp. 289–308.
Article MATH Google Scholar
M.E. Figueroa and J.H. Moreno, “A decoupled access/execute processor for matrix algorithms: architecture and programming,” Technical Report, Department of Electrical Engineering, University of Concepción, Concepción, Chile, 1991.
Google Scholar
W.M. Gentleman, “Least squares computations by Givens transformations without square roots,”Journal Institute Mathematics Applications, 12, 1973, pp. 329–336.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Ingeniería Eléctrica, Universidad de Concepción, Casilla 53-C, Concepción, Chile
Jaime H. Moreno & Miguel E. Figueroa
Departamento Arquitectura de Computadores, Univ. Politècnica de Catalunya, Pau Gargallo 5, 08028, Barcelona, Spain
Tomas Lang

Authors

Jaime H. Moreno
View author publications
You can also search for this author in PubMed Google Scholar
Miguel E. Figueroa
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Lang
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This research has been supported in part by Universidad de Concepción (Grant DI-20.92.21, “Linear array for matrix algorithms”) and by NSF (Grant MIP-8813340, “Composite operations using on-line arithmetic in application-specific parallel architectures”).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moreno, J.H., Figueroa, M.E. & Lang, T. Linear pseudosystolic array for partitioned matrix algorithms. J VLSI Sign Process Syst Sign Image Video Technol 3, 201–214 (1991). https://doi.org/10.1007/BF00925831

Download citation

Received: 23 November 1990
Accepted: 15 January 1991
Published: 01 September 1991
Issue Date: September 1991
DOI: https://doi.org/10.1007/BF00925831

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear pseudosystolic array for partitioned matrix algorithms

Abstract

Access this article

Similar content being viewed by others

A Simple Study of Pleasing Parallelism on Multicore Computers

The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures

Symbolic Mapping of Loop Programs onto Processor Arrays

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Linear pseudosystolic array for partitioned matrix algorithms

Abstract

Access this article

Similar content being viewed by others

A Simple Study of Pleasing Parallelism on Multicore Computers

The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures

Symbolic Mapping of Loop Programs onto Processor Arrays

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation