A Systolic Design Methodology with Application to Full-Search Block-Matching Architectures

Chen, Yen-Kuang; Kung, S.Y.

doi:10.1023/A:1008012332212

Yen-Kuang Chen¹ &
S.Y. Kung¹

109 Accesses
13 Citations
3 Altmetric
Explore all metrics

Abstract

We present a systematic methodology to support the design tradeoffs of array processors in several emerging issues, such as (1) high performance and high flexibility, (2) low cost, low power, (3) efficient memory usage, and (4) system-on-a-chip or the ease of system integration. This methodology is algebraic based, so it can cope with high-dimensional data dependence. The methodology consists of some transformation rules of data dependency graphs for facilitating flexible array designs. For example, two common partitioning approaches, LPGS and LSGP, could be unified under the methodology. It supports the design of high-speed and massively parallel processor arrays with efficient memory usage. More specifically, it leads to a novel systolic cache architecture comprising of shift registers only (cache without tags). To demonstrate how the methodology works, we have presented several systolic design examples based on the block-matching motion estimation algorithm (BMA). By multiprojecting a 4D DG of the BMA to 2D mesh, we can reconstruct several existing array processors. By multiprojecting a 6D DG of the BMA, a novel 2D systolic array can be derived that features significantly improved rates in data reusability (96%) and processor utilization (99%).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Systolic Arrays

A Block-Based Systolic Array on an HBM2 FPGA for DNA Sequence Alignment

Systolic genetic search, a systolic computing-based metaheuristic

Article 19 July 2014

References

D. Le Gall, “MPEG: A video compression standard for multimedia applications,” Communications of the ACM, Vol. 34 No.4, April 1991.
K. Guttag, R.J. Gove, and J.R.V. Aken, “A single-chip multiprocessor for multimedia: The MVP,” IEEE Computer Graphics and Applications, Vol. 11 No.6, pp. 53-64, Nov. 1992.
Article Google Scholar
F. Sijstermans and J. van der Meer, “CD-1 full-motion video encoding on a parallel computer,” Communications of the ACM, Vol. 34 No.4, pp. 81-91, April 1991.
Article Google Scholar
L. De Vos and M. Stegherr, “Parameterizable VLSI architectures for full-search block-matching algorithm,” IEEE Trans. on Circuits and Systems, Vol. 36 No.10, pp. 1309-1316, Oct. 1989.
Article Google Scholar
T. Komarek and P. Pirsch, “Array architectures for block matching algorithms,” IEEE Trans. on Circuits and Systems, Vol. 36 No.10, pp. 1301-1308, Oct. 1989.
Article Google Scholar
M.-T. Sun, “Algorithms and VLSI architectures for motion estimation,” VLSI Implementations for Image Communications, pp. 251-282, 1993.
B.-M. Wang, J.-C. Yen, and S. Chang, “Zero waiting-cycle hierarchical block matching algorithm and its array architectures,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 4 No.4, pp. 18-28, Feb. 1994.
Article Google Scholar
L. De Vos, “VLSI-architectures for the hierarchical block-matching algorithm for HDTV applications,” SPIE Visual Communications and Image Processing, Vol. 1360, pp. 398-409, 1990.
Google Scholar
C.-H. Hsieh and T.-P. Lin, “VLSI architecture for blockmatching motion estimation algorithm,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 2 No.2, pp. 169-175, June 1992.
Article Google Scholar
P. Pirsch, N. Demassieux, and W. Gehrke, “VLSI architectures for video compression-A survey,” Proceedings of the IEEE, Vol. 83 No.2, pp. 220-246, Feb. 1995.
Article Google Scholar
J. Baek, S. Nam, M. Lee, C. Oh, and K. Hwang, “A fast array architecture for block matching algorithm,” Proc. of IEEE Symposium on Circuits and Systems, Vol. 4, pp. 211-214, 1994.
Google Scholar
S. Chang, J.-H. Hwang, and C.-W. Jen, “Scalable array architecture design for full search block matching,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 5 No.4, pp. 332- 343, Aug. 1995.
Article Google Scholar
S.Y. Kung, VLSI Array Processors, Prentice Hall, Englewood Cliffs, NJ, 1988.
Google Scholar
J. Teich and L. Thiele, “Partitioning of processor arrays: A piecewise regular approach,” INTEGRATION: The VLSI Journal, Vol. 14 No.3, pp. 297-332, 1993.
MATH Google Scholar
J. Teich, L. Thiele, and L. Zhang, “Partitioning processor arrays under resource constraints,” Journal of VLSI Signal Processing, Vol. 17 No.1, pp. 5-20, Sept. 1997.
Article Google Scholar
K.-H. Zimmermann, “Linear mappings of n-dimensional uniform recurrences onto k-dimensional systolic array,” Journal of Signal Processing System for Signal, Image, and Video Technology, Vol. 12 No.2, pp. 187-202, May 1996.
Article Google Scholar
H. Yeo and Y.-H. Hu, “A novel modular systolic array architecture for full-search block matching motion estimation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 5 No.5, pp. 407-416, Oct. 1995.
Article Google Scholar
K.-H. Zimmermann and W. Achtziger, “On time optimal implementation of uniform recurrences onto array processors via quadratic programming,” Journal of VLSI Signal Processing, Vol. 19 No.1, pp. 19-38, 1998.
Article MATH Google Scholar
K.-H. Zimmermann and W. Achtziger, “Finding space-time transformations for uniform recurrences via branching parametric linear programming,” Journal of VLSI Signal Processing, Vol. 15 No.3, pp. 259-274, 1997.
Article Google Scholar
Y.-K. Chen and S.Y. Kung, “An operation placement and scheduling scheme for cache and communication localities in fine-grain parallel architectures,” Proc. of Int'l Symposium on Parallel Architectures, Algorithms and Networks, pp. 390-396, Dec. 1997.
N.L. Passos and E.H.-M. Sha, “Achieving full parallelism using multidimensional retiming,” IEEE Trans. on Parallel and Distributed Systems, Vol. 7 No.11, pp. 1150-1163, Nov. 1996.
Article Google Scholar
G.-J. Li and B.W. Wah, “The design of optimal systolic array,” IEEE Trans. on Computer, Vol. 34 No.1, pp. 66-77, Jan. 1985.
MATH Google Scholar
W.F. Verhaegh, P.E. Lippens, E.H. Aarts, J.H. Korst, J.L. van Meerbergen, and A. van der Werf, “Improved force-directed scheduling in high-throughput digital signal processing,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Vol. 14 No.8, pp. 945-960, Aug. 1995.
Article Google Scholar
Y. Wong and J.-M. Delosme, “Optimization of computation time for systolic array,” IEEE Trans. on Computer, Vol. 41 No.2, pp. 159-177, Feb. 1992.
Article Google Scholar
Y.-T. Hwang and Y.-H. Hu, “A unified partitioning and scheduling scheme for mapping multi-stage regular iterative algorithms onto processor arrays,” Journal of VLSI Signal Processing Applications, Vol. 11, pp. 133-150, Oct. 1995.
Article Google Scholar
K.-H. Zimmermann, “A unifying lattice-based approach for the partitioning of systolic arrays via LPGS and LSGP,” Journal of VLSI Signal Processing, Vol. 17 No.1, pp. 21-47, Sept. 1997.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Princeton University, Princeton, NJ, 08544
Yen-Kuang Chen & S.Y. Kung

Authors

Yen-Kuang Chen
View author publications
You can also search for this author in PubMed Google Scholar
S.Y. Kung
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, YK., Kung, S. A Systolic Design Methodology with Application to Full-Search Block-Matching Architectures. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 19, 51–77 (1998). https://doi.org/10.1023/A:1008012332212

Download citation

Published: 01 May 1998
Issue Date: May 1998
DOI: https://doi.org/10.1023/A:1008012332212

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systolic Design Methodology with Application to Full-Search Block-Matching Architectures

Abstract

Access this article

Similar content being viewed by others

Systolic Arrays

A Block-Based Systolic Array on an HBM2 FPGA for DNA Sequence Alignment

Systolic genetic search, a systolic computing-based metaheuristic

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Systolic Design Methodology with Application to Full-Search Block-Matching Architectures

Abstract

Access this article

Similar content being viewed by others

Systolic Arrays

A Block-Based Systolic Array on an HBM2 FPGA for DNA Sequence Alignment

Systolic genetic search, a systolic computing-based metaheuristic

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation