Skip to main content
Log in

A Systolic Design Methodology with Application to Full-Search Block-Matching Architectures

  • Published:
Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Abstract

We present a systematic methodology to support the design tradeoffs of array processors in several emerging issues, such as (1) high performance and high flexibility, (2) low cost, low power, (3) efficient memory usage, and (4) system-on-a-chip or the ease of system integration. This methodology is algebraic based, so it can cope with high-dimensional data dependence. The methodology consists of some transformation rules of data dependency graphs for facilitating flexible array designs. For example, two common partitioning approaches, LPGS and LSGP, could be unified under the methodology. It supports the design of high-speed and massively parallel processor arrays with efficient memory usage. More specifically, it leads to a novel systolic cache architecture comprising of shift registers only (cache without tags). To demonstrate how the methodology works, we have presented several systolic design examples based on the block-matching motion estimation algorithm (BMA). By multiprojecting a 4D DG of the BMA to 2D mesh, we can reconstruct several existing array processors. By multiprojecting a 6D DG of the BMA, a novel 2D systolic array can be derived that features significantly improved rates in data reusability (96%) and processor utilization (99%).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Le Gall, “MPEG: A video compression standard for multimedia applications,” Communications of the ACM, Vol. 34 No.4, April 1991.

  2. K. Guttag, R.J. Gove, and J.R.V. Aken, “A single-chip multiprocessor for multimedia: The MVP,” IEEE Computer Graphics and Applications, Vol. 11 No.6, pp. 53-64, Nov. 1992.

    Article  Google Scholar 

  3. F. Sijstermans and J. van der Meer, “CD-1 full-motion video encoding on a parallel computer,” Communications of the ACM, Vol. 34 No.4, pp. 81-91, April 1991.

    Article  Google Scholar 

  4. L. De Vos and M. Stegherr, “Parameterizable VLSI architectures for full-search block-matching algorithm,” IEEE Trans. on Circuits and Systems, Vol. 36 No.10, pp. 1309-1316, Oct. 1989.

    Article  Google Scholar 

  5. T. Komarek and P. Pirsch, “Array architectures for block matching algorithms,” IEEE Trans. on Circuits and Systems, Vol. 36 No.10, pp. 1301-1308, Oct. 1989.

    Article  Google Scholar 

  6. M.-T. Sun, “Algorithms and VLSI architectures for motion estimation,” VLSI Implementations for Image Communications, pp. 251-282, 1993.

  7. B.-M. Wang, J.-C. Yen, and S. Chang, “Zero waiting-cycle hierarchical block matching algorithm and its array architectures,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 4 No.4, pp. 18-28, Feb. 1994.

    Article  Google Scholar 

  8. L. De Vos, “VLSI-architectures for the hierarchical block-matching algorithm for HDTV applications,” SPIE Visual Communications and Image Processing, Vol. 1360, pp. 398-409, 1990.

    Google Scholar 

  9. C.-H. Hsieh and T.-P. Lin, “VLSI architecture for blockmatching motion estimation algorithm,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 2 No.2, pp. 169-175, June 1992.

    Article  Google Scholar 

  10. P. Pirsch, N. Demassieux, and W. Gehrke, “VLSI architectures for video compression-A survey,” Proceedings of the IEEE, Vol. 83 No.2, pp. 220-246, Feb. 1995.

    Article  Google Scholar 

  11. J. Baek, S. Nam, M. Lee, C. Oh, and K. Hwang, “A fast array architecture for block matching algorithm,” Proc. of IEEE Symposium on Circuits and Systems, Vol. 4, pp. 211-214, 1994.

    Google Scholar 

  12. S. Chang, J.-H. Hwang, and C.-W. Jen, “Scalable array architecture design for full search block matching,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 5 No.4, pp. 332- 343, Aug. 1995.

    Article  Google Scholar 

  13. S.Y. Kung, VLSI Array Processors, Prentice Hall, Englewood Cliffs, NJ, 1988.

    Google Scholar 

  14. J. Teich and L. Thiele, “Partitioning of processor arrays: A piecewise regular approach,” INTEGRATION: The VLSI Journal, Vol. 14 No.3, pp. 297-332, 1993.

    MATH  Google Scholar 

  15. J. Teich, L. Thiele, and L. Zhang, “Partitioning processor arrays under resource constraints,” Journal of VLSI Signal Processing, Vol. 17 No.1, pp. 5-20, Sept. 1997.

    Article  Google Scholar 

  16. K.-H. Zimmermann, “Linear mappings of n-dimensional uniform recurrences onto k-dimensional systolic array,” Journal of Signal Processing System for Signal, Image, and Video Technology, Vol. 12 No.2, pp. 187-202, May 1996.

    Article  Google Scholar 

  17. H. Yeo and Y.-H. Hu, “A novel modular systolic array architecture for full-search block matching motion estimation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 5 No.5, pp. 407-416, Oct. 1995.

    Article  Google Scholar 

  18. K.-H. Zimmermann and W. Achtziger, “On time optimal implementation of uniform recurrences onto array processors via quadratic programming,” Journal of VLSI Signal Processing, Vol. 19 No.1, pp. 19-38, 1998.

    Article  MATH  Google Scholar 

  19. K.-H. Zimmermann and W. Achtziger, “Finding space-time transformations for uniform recurrences via branching parametric linear programming,” Journal of VLSI Signal Processing, Vol. 15 No.3, pp. 259-274, 1997.

    Article  Google Scholar 

  20. Y.-K. Chen and S.Y. Kung, “An operation placement and scheduling scheme for cache and communication localities in fine-grain parallel architectures,” Proc. of Int'l Symposium on Parallel Architectures, Algorithms and Networks, pp. 390-396, Dec. 1997.

  21. N.L. Passos and E.H.-M. Sha, “Achieving full parallelism using multidimensional retiming,” IEEE Trans. on Parallel and Distributed Systems, Vol. 7 No.11, pp. 1150-1163, Nov. 1996.

    Article  Google Scholar 

  22. G.-J. Li and B.W. Wah, “The design of optimal systolic array,” IEEE Trans. on Computer, Vol. 34 No.1, pp. 66-77, Jan. 1985.

    MATH  Google Scholar 

  23. W.F. Verhaegh, P.E. Lippens, E.H. Aarts, J.H. Korst, J.L. van Meerbergen, and A. van der Werf, “Improved force-directed scheduling in high-throughput digital signal processing,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Vol. 14 No.8, pp. 945-960, Aug. 1995.

    Article  Google Scholar 

  24. Y. Wong and J.-M. Delosme, “Optimization of computation time for systolic array,” IEEE Trans. on Computer, Vol. 41 No.2, pp. 159-177, Feb. 1992.

    Article  Google Scholar 

  25. Y.-T. Hwang and Y.-H. Hu, “A unified partitioning and scheduling scheme for mapping multi-stage regular iterative algorithms onto processor arrays,” Journal of VLSI Signal Processing Applications, Vol. 11, pp. 133-150, Oct. 1995.

    Article  Google Scholar 

  26. K.-H. Zimmermann, “A unifying lattice-based approach for the partitioning of systolic arrays via LPGS and LSGP,” Journal of VLSI Signal Processing, Vol. 17 No.1, pp. 21-47, Sept. 1997.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, YK., Kung, S. A Systolic Design Methodology with Application to Full-Search Block-Matching Architectures. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 19, 51–77 (1998). https://doi.org/10.1023/A:1008012332212

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008012332212

Keywords

Navigation