Abstract
The consolidation of Internet devices into a universal/portable device will soon be accomplishable through the incorporation of reconfigurable computing in system-on-a-chip (SOC). At any particular moment, it could be a video/audio mobile phone, an MP3 song player, and other devices. The basic construct of these multimedia processing algorithms can be described as deep nested Do loop algorithms. They are considered the most demanding data-intensive algorithms and hence ideal candidates for an array of reconfigurable nanoprocessors. Therefore, algorithm to hardware synthesis methodology is important for an efficient exploitation of both spatial parallelism and temporal pipelining. In this paper, we propose a processor array synthesis methodology. It can map an n-level nested Do loop represented by a nonuniform or shift-variant data dependence graph to a near-optimal of one-or two-dimensional processor array under the available resource constraints to satisfy high-throughput computation demands.
Similar content being viewed by others
References
M. Cummings and S. Haruyama. FPGA in the software radio. IEEE Communications Magazine, (2)37:108-112, 1999.
E. Mirsky and A. DeHon. MATRIX: A reconfigurable computing architecture with configurable instruction distribution and aeployable resources. In Proc. IEEE Symposium on FPGAs for Custom Computing Machines, pp. 157-166, 1996.
H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M. Chaves Filho. MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. on Computers, 49(5):465-481, 2000.
T. Nishitani. An approach to a multimedia system on a chip. IEEE Workshop on Signal Processing Systems, pp. 13-21, 1999.
Triscend Corporation. Mountain View, CA. http://www.triscend.com.
Chameleon Systems Inc. San Jose, CA. http://www.chameleonsystems.com.
R. Rinker, M. Carter, A. Patel, M. Chawathe, C. Ross, J. Hammes, W. A. Najjar, and W. Bohm. An automated process for compiling dataflow graphs into reconfigurable hardware. IEEE Trans. on Very Large Scale Integration, 9(1):130-139, 1999.
A. Stone and E. S. Manalokos. DG2VHDL: To facilitate the high level synthesize of parallel processing array architectures. J. of VLSI Signal Processing, 24(1):99-120, 2000.
S. Y. Kung. VLSI Array Processors. Printice Hall, Englewood Cliffs, New Jersey, 1988.
D. R. Martinez, T. J. Moeller, and K. Teitelbaum. Application of reconfigurable computing to a high performance front-end radar signal processor. J. of VLSI Signal Processing, 28(1/2):65-83, 2001.
R. M. Karp, R. E. Miller, and S. Winograd. The organization of computations for uniform recurrence equations. J. ACM, 14(3):563-590, 1967.
H. Yeo and Y. H. Hu. A novel modular systolic array architecture for full-search block matching motion estimation. IEEE Trans. on Circuit and System for Video Technology, 5(5):407-416, 1995.
V. Van Dongen and P. Quinton. Uniformization of linear recurrence equations: A step towards the automatic synthesis of systolic arrays. In Proceedings of the International Conference on Systolic Arrays, pp. 473-482, 1988.
F. M. El-Hadidy and O. E. Herrmann. Generalized methodology for array processor design of real time systems. In IEEE Asia-Pacific Conference on Circuits and Systems, pp. 145-150, 1994.
M. J. Wolfe. High Performance Compilers for Parallel Computing, Addison-Wesley, Redwood City, CA, 1996.
Xilinx Inc. Virtex 2.5V field programmable gate arrays. May 2000.
Xilinx Inc. Virtex-II 1.5V field programmable gate arrays. Jan. 2000.
P. Sundaranjan and S. A. Guccione. XVPI: A protable hardware/software interface for virtex. In Proceedings of SPIE, 4212:90-65, 2000.
P. Lee and Z. M. Kedem. Synthesizing linear array algorithms from nested for loop algorithms. IEEE Trans. on Computers, 37(12):1578-1598, 1988.
W. Shang and J. A. B. Fortes. On time mapping of uniform dependence algorithms into lower dimensional processor arrays. IEEE Trans. on Parallel and Distributed Systems, 3(3):350-363, 1992.
Y.-K. Chen and S. Y. Kung. A systolic methodology with applications to full-search block matching architectures. J. of VLSI Signal Processing, 19(1):51-77, 1998.
S. Kittitornkun and Y. H. Hu. Frame-level pipelined motion estimation array processor. IEEE Trans. on Circuit and System for Video Technology, 11(2):248-251, 2001.
T. Komarek and P. Pirsch. Array architectures for block matching algorithms. IEEE Trans. on Circuit and System, 36(10):1301-1308, 1989.
L. D. Vos and M. Stegherr. Parameterizable VLSI architecutres for the full-search block-matching algorithm. IEEE Trans. on Circuit and System, 36(10):1309-1316, 1989.
C. H. Hsieh and T. P. Lin. VLSI architecture for block-matching motion estimation algorithm. IEEE Trans. on Circuit and System for Video Technology, 2(2):169-175, 1992.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kittitornkun, S., Hu, Y.H. Processor Array Synthesis from Shift-Variant Deep Nested Do Loops. The Journal of Supercomputing 24, 229–249 (2003). https://doi.org/10.1023/A:1022028729196
Issue Date:
DOI: https://doi.org/10.1023/A:1022028729196