Abstract
In this paper, we present a novel scheme for performing fixed-point arithmetic efficiently on fine-grain, massively parallel, programmable architectures including both custom and FPGA-based systems. We achieve anO(n) speedup, wheren is the operand precision, over the bit-serial methods of existing fine-grain systems such as the DAP, the MPP and the CM2, within the constraints of regular, near neighbor communication and only a small amount of on-chip memory. This is possible by means of digit pipelined algorithms which avoid broadcast and which operate in a fully systolic manner by pipelining at the digit level. A base 4, signed-digit, fully redundant number system and on-line techniques are used to limit carry propagation and minimize communication costs. p ]Although our algorithms are digit-serial, we are able to match the performance of the bit-parallel methods, while retaining low communication complexity. Reconfigurable hardware systems built using field programmable gate arrays (FPGA's) can share in the speed benefits of these algorithms. By using the organization of logic blocks suggested in this paper, problems of placement and routing that exist in such systems can be avoided. Since the algorithms are amenable to pipelining, very high throughput can be obtained.
Similar content being viewed by others
References
K.E. Batcher, “Design of a Massively Parallel Processor,”IEEE Trans. on Computers, Vol. 29, pp. 836–840, 1980.
P. Bertin, D. Roncin, and J. Vuillemin, “Introduction to Programmable Active Memories,”Proc. of International Conference on Systolic Array Processors, pp. 301–309, 1989.
W.D. Hillis,The Connection Machine, Cambridge, MA: MIT Press, 1986.
J. Gray and T. Kean, “Configurable Hardware: A New Paradigm for Computation,”Advanced Research in VLSI, Proc. of the Decennial Caltech Conference on VLSI, pp. 1–17, 1989.
M.J. Irwin and R.M. Owens “A Micro-Grained VLSI Signal Processor,”ICASSP-92, pp. 641–644, 1992.
P.W. Foulk, “User Configurable Logic,”Computing and Control Engineering Journal, pp. 205–213, 1992.
R.S. Bajwa, R.M. Owens, and M.J. Irwin, “Area Time Tradeoffs in Micro-Grain VLSI Array Architectures,”IEEE Trans. on Computers, Vol. 43, pp. 1121–1128, 1994.
D. Zhou, F.P. Preparata, and S.M. Kang, “Interconnection Delay in Very High-Speed VLSI,”IEEE Trans. on Circuits and Systems, Vol. 38, pp. 779–790, 1991.
S.F. Reddaway, “DAP—A Distributed Array Processor,”Proc. First Annual Symposium on Computer Architecture, pp. 61–65, 1973.
K.E. Batcher, “STARAN Parallel Processor System Hardware,”Proc. Natl. Comput. Conf., AIFPS, pp. 405–410, 1974.
I.D. Scherson, D.A. Kramer, and B.D. Alleyne, “Bit-Parallel Arithmetic in a Massively-Parallel Associative Processor,”IEEE Trans. on Computers, Vol. 41, pp. 1201–1209, 1992.
Kishore S. Trivedi and Milos D. Ercegovac, “On-line Algorithms for Division and Multiplication,”IEEE Trans. on Computers, Vol. C-26, pp. 681–687, 1977.
M.J. Irwin, “An Arithmetic Unit for On-line Computation,” Ph.D. Thesis, Technical Report UIUCDS-R-77-873, University of Illinois, Urbana, May 1977.
R.M. Owens, “Digit On-line Algorithms for Pipeline Architectures,” Ph.D. Thesis, Technical Report Rep. CS80-20, Dept. Comp. Sci., The Pennsylvania State University, University Park, PA, Aug. 1980.
J.M. Arnold, D.A. Buell, and E.G. Davis, “Splash 2,”Proc. 4th Annu. ACM Symp. on Parallel Algorithms and Architectures, pp. 316–324, 1992.
C.E. Cox and W.E. Blanz, “GANGLION—A Fast Field-Programmable Gate Array Implementation of a Connectionist Classifier,”IEEE Journal of Solid-State Circuits, Vol. 27, pp. 288–299, 1992.
Xilinx,The XC4000 Data Book, Xilinx, Inc., San Jose, CA, 1991.
A. Avizienis, “Signed-digit number representations for fast parallel arithmetic,”IRE Trans. on Electronic Computers, Vol. EC-10, pp. 389–400, 1961.
J.E. Robertson, “A new class of digital division methods,”IRE Trans. on Electronic Computing, Vol. EC-7, pp. 218–222, 1958.
M.J. Irwin and R.M. Owens, “Digit Pipelined Arithmetic as Illustrated by the Paste-Up System,”IEEE Computer, pp. 61–73, 1987.
R.M. Owens and M.J. Irwin, “A Two-Dimensional, Distributed Logic Processor,”IEEE Trans. on Computers, Vol. 40, pp. 1094–1101, 1991.
F.P. Preparata and J.E. Vuillemin, “Practical cellular dividers,”IEEE Trans. on Computers, Vol. 39, pp. 605–614, 1990.
C. Nagendra, M.J. Irwin, and R.M. Owens, “Digit Pipelined Discrete Wavelet Transform,”Intl. Conf. on Acoustics Speech and Signal Processing, April 1994.
Israel Koren,Computer Arithmetic Algorithms, Chapter 5, Englewood Cliffs, NJ: Prentice Hall, 1993.
S.E. McQuillan, J.V. McCanny, and R. Hamill, “New algorithms and VLSI Architectures for SRT Division and Square Root,”Proc. 11th Symp. on Computer Arithmetic, pp. 80–86, 1993.
T.M. Carter and J.E. Robertson, “Radix-16 Signed-Digit Division,”IEEE Trans. on Computers, Vol. C-39, pp. 1424–1433, 1990.
N. Burgess and T. Williams, “Choices of Quotient Digit Selection Tables for SRT Division,”IEEE Trans. on Computers (To appear.)
G.S. Taylor, “Radix-16 SRT dividers with overlapped quotient selection stages,”Proc. 7th Symp. on Computer Arithmetic, pp. 64–73, 1985.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Nagendra, C., Owens, R.M. & Irwin, M.J. Digit pipelined arithmetic on fine-grain array processors. Journal of VLSI Signal Processing 9, 193–209 (1995). https://doi.org/10.1007/BF02407085
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF02407085