Stride Permutation Networks for Array Processors

Järvinen, Tuomas; Salmela, Perttu; Sorokin, Harri; Takala, Jarmo

doi:10.1007/s11265-006-0031-8

Tuomas Järvinen¹,
Perttu Salmela²,
Harri Sorokin² &
…
Jarmo Takala²

113 Accesses
8 Citations
Explore all metrics

Abstract

In several digital signal processing algorithms, computational nodes are organized in consecutive stages and data is reordered between these stages. Parallel computation of such algorithms with reduced number of processing elements implies that several computational nodes are assigned to each element. As a drawback, permutations become more complex and require data storage. In this paper, a systematic design methodology for stride permutation networks is derived. These permutations are represented with Boolean matrices, which are decomposed and mapped directly onto register-based networks. The resulting networks are regular and scalable and they support any stride of power-of-two. In addition, the networks reach the lower bound in the number of registers indicating area-efficiency. Since the proposed methodology is systematic, it can be exploited in automated design generation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low Power Semi-systolic Architectures for Polynomial-Basis Multiplication over GF(2 m ) Using Progressive Multiplier Reduction

Article 19 April 2015

Decomposition of Index Generation Functions Using a Monte Carlo Method

Dynamic Scheduling of Pipelined Functional Units in Coarse-Grained Reconfigurable Array Elements

References

J. H. Takala, T. S. Järvinen, and H. T. Sorokin, “Conflict-free Parallel Memory Access Scheme for FFT Processors,” Proc. IEEE ISCAS, Bangkok, Thailand, May 25–28, 2003.
J. A. Hidalgo, J. Lopez, F. Argüello, and E. L. Zapata, “Area-efficient Architecture for Fast Fourier Transform,” IEEE Trans. Circuits Syst. II, vol. 46, no. 2, 1999, pp. 187–193 (Feb).
Article Google Scholar
S. Y. Kim, H. Kim, and I. C. Park, “Path Metric Memory Management for Minimising Interconnections in Viterbi Decoders,” IEEE Elect. Lett., vol. 37, no. 14, 2001, pp. 925–926 (July).
Article Google Scholar
H. S. Stone, “Parallel Processing with the Perfect Shuffle,” IEEE Trans. Comput., vol. C-20, no. 2, 1971, pp. 153–161 (Feb.).
Article Google Scholar
J. Granata, M. Conner, and R. Tolimieri, “Recursive Fast Algorithms and the Role of the Tensor Product,” IEEE Trans. Signal Process., vol. 40, no. 12, 1992, pp. 2921–2930 (Dec.).
Article MATH Google Scholar
J. Astola and D. Akopian, “Architecture-oriented Regular Algorithms for Discrete Sine and Cosine Transforms,” IEEE Trans. Signal Process., vol. 47, no. 4, 1999, pp. 1109–1124 (Apr.).
Article MATH Google Scholar
J. Kwak and J. You, “One- and Two-dimensional Constant Geometry Fast Cosine Transform Algorithms and Architectures,” IEEE Trans. Signal Process., vol. 47, no. 7, 1999, pp. 2023–2034 (July).
Article MathSciNet MATH Google Scholar
J. Takala, T. Järvinen, P. Salmela, and D. Akopian, “Multi-port Interconnection Networks for Radix-r Algorithms,” in Proc. IEEE ICASSP, Salt Lake City, UT, U.S.A., May 7–11 2001, pp. 1177–1180.
J. Takala and T. Järvinen, “Multi-port Interconnection Networks for Matrix Transpose,” in Proc. IEEE ICASSP, Phoenix, AZ, U.S.A., May 26–29 2002, pp. 874–877.
T. Järvinen and J. Takala, “Register-based Permutation Networks for Stride Permutations,” in Computer Systems: Architectures, Modeling, and Simulation (LNCS 3133), A. D. Pimentel and S. Vassiliadis (Eds.), Springer, Berlin Heidelberg New York, 2004, pp. 108–117.
Chapter Google Scholar
T. Järvinen, P. Salmela, H. Sorokin, and J. Takala, “Stride Permutation Networks for Array Processors,” in Proc. IEEE 15th International Conference on Application-Specific Systems, Architectures, and Processors, Galveston, TX, U.S.A., Sept. 27–29 2004.
T. Järvinen, “Systematic Methods for Designing Stride Permutation Interconnections,” Dr. Tech. dissertation, Tampere University of Technology, Tampere, Finland, Nov. 2004. Available online at http://webhotel.tut.fi/library/tutdiss/pdf/jarvinen.pdf.
E. Bidet, D. Castelain, C. Joanblang, and P. Senn, “A Fast Single-chip Implementation of 8192 Complex Point FFT,” IEEE J. Solid-State Circuits, vol. 30, no. 3, 1995, pp. 300–305 (Mar.).
Article Google Scholar
M. Bóo, F. Argüello, J. Bruguera, R. Doallo, and E. Zapata, “High-performance VLSI Architecture for the Viterbi Algorithm,” IEEE Trans. Commun., vol. 45, no. 2, 1997, pp. 168–176 (Feb.).
Article Google Scholar
C. B. Shung, H.-D. Lin, R. Cypher, P. H. Siegel, and H. K. Thapar, “Area-efficient Architectures for the Viterbi Algorithm Part I: Theory,” IEEE Trans. Commun., vol. 41, no. 4, 1993, pp. 636–644 (Apr.).
Article MATH Google Scholar
K. K. Parhi, “Systematic Synthesis of DSP Data Format Converters Using Life-time Analysis and Forward–backward Register Allocation,” IEEE Trans. Circuits Syst. II, vol. 39, no. 7, 1992, pp. 423–440 (Jul.).
Article Google Scholar
M. Majumdar and K. K. Parhi, “Design of Data Format Converters Using Two-dimensional Register Allocation,” IEEE Trans. Circuits Syst. II, vol. 45, no. 4, 1998, pp. 504–508 (Apr.).
Article MATH Google Scholar
J. Bae and V. K. Prasanna, “Synthesis of Area-efficient and High-throughput Rate Data Format Converters,” IEEE Trans. VLSI Syst., vol. 6, no. 4, 1998, pp. 697–706 (Dec.).
Article Google Scholar
K. Srivatsan, C. Chakrabarti, and L. Lucke, “A New Register Allocation Scheme for Low-power Data Format Converters,” IEEE Trans. Circuits Syst. II, vol. 46, no. 9, 1999, pp. 1250–1253 (Sept.).
Article Google Scholar
M. Davio, “Kronecker Products and Shuffle Algebra,” IEEE Trans. Comput., vol. 30, no. 2, 1981, pp. 116–125 (Feb.).
Article MathSciNet Google Scholar
J. C. Carlach, P. Penard, and J. L. Sicre, “TCAD: a 27 MHz 8 × 8 Discrete Cosine Transform Chip,” in Proc. IEEE ICASSP, Glasgow, UK, May 23–26 1989, pp. 2429–2432.
M. Kovac and P. Ranganathan, “JAGUAR: A High Speed VLSI Chip for JPEG Image Compression Standard,” in Proc. IEEE Int. Conf. on VLSI Design, New Delhi, India, Jan. 4–7 1995, pp. 220–224.

Download references

Author information

Authors and Affiliations

Nokia Technology Platforms, P. O. Box 68, 33721, Tampere, Finland
Tuomas Järvinen
Tampere University of Technology, P. O. Box 68, 33721, Tampere, Finland
Perttu Salmela, Harri Sorokin & Jarmo Takala

Authors

Tuomas Järvinen
View author publications
You can also search for this author in PubMed Google Scholar
Perttu Salmela
View author publications
You can also search for this author in PubMed Google Scholar
Harri Sorokin
View author publications
You can also search for this author in PubMed Google Scholar
Jarmo Takala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tuomas Järvinen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Järvinen, T., Salmela, P., Sorokin, H. et al. Stride Permutation Networks for Array Processors. J VLSI Sign Process Syst Sign Im 49, 51–71 (2007). https://doi.org/10.1007/s11265-006-0031-8

Download citation

Published: 27 March 2007
Issue Date: October 2007
DOI: https://doi.org/10.1007/s11265-006-0031-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stride Permutation Networks for Array Processors

Abstract

Access this article

Similar content being viewed by others

Low Power Semi-systolic Architectures for Polynomial-Basis Multiplication over GF(2 m ) Using Progressive Multiplier Reduction

Decomposition of Index Generation Functions Using a Monte Carlo Method

Dynamic Scheduling of Pipelined Functional Units in Coarse-Grained Reconfigurable Array Elements

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stride Permutation Networks for Array Processors

Abstract

Access this article

Similar content being viewed by others

Low Power Semi-systolic Architectures for Polynomial-Basis Multiplication over GF(2 m ) Using Progressive Multiplier Reduction

Decomposition of Index Generation Functions Using a Monte Carlo Method

Dynamic Scheduling of Pipelined Functional Units in Coarse-Grained Reconfigurable Array Elements

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation