Skip to main content
Log in

Mapping of Discrete Cosine Transforms onto Distributed Hardware Architectures

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

We present an algorithmically-aware, high-level partitioning methodology for discrete cosine transforms (DCT) targeted to distributed hardware architectures. The methodology relies on the exploration of alternate DCT formulations as part of the partition optimization process. To the best of our knowledge, no previously proposed DCT algorithm exists that is capable of consistently producing alternate regular formulations for an n-size DCT. Hence, a new Cooley-Tukey-like DCT factorization algorithm was developed to allow exploration of alternate formulations as part of the partitioning optimization process. The use of our factorization mechanism along with a greedy strategy to explore the space of equivalent DCT formulations yielded partitioning solutions with as much as 18% reduction in latency and 83% reduction in run-time as compared to previously proposed regular DCT formulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

Similar content being viewed by others

References

  1. Nikara, J. (2004). Application-specific parallel structures for discrete cosine transform and variable length coding. PhD thesis, Tampere University of Technology.

  2. Hsiao, S.-F., & Tseng, J.-M. (2001). Parallel, pipelined and folded architectures for computation of 1-D and 2-D DCT in image and video codec. Journal of VLSI Signal Processing, 28(3), 205–220.

    Article  MATH  Google Scholar 

  3. Srinivasan, V., Govindarajan, S., & Vemuri, R. (2001). Fine-grained and coarse-grained behavioral partitioning with effective utilization of memory and design space exploration for multi-FPGA architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(1), 140–159.

    Article  Google Scholar 

  4. Bringmann, O., Menn, C., & Rosenstiel, W. (2000). Target architecture oriented high-level synthesis for multi-FPGA based emulation. In Proceedings of the European design and test conference 2000 (pp. 326–332).

  5. Duncan, A. A., Hendry, D. C., & Gray, P. (2001). The COBRA-ABS high-level synthesis system for multi-FPGA custom computing machines. IEEE Transactions on Very Large Scale Integration (VLSI ) Systems, 9(1), 218–223.

    Article  Google Scholar 

  6. Arce-Nazario, R. A., Jimenez, M., & Rodriguez, D. (2006). Functionally-aware partitioning of discrete signal transforms for distributed hardware architectures. In Proceedings of the 49th midwest symposium on circuits and systems (pp. 1438–1441).

  7. Arce-Nazario, R. A., Jimenez, M., & Rodriguez, D. (2007). Algorithmic-level exploration of discrete signal transforms for partitioning to distributed hardware architectures. IET Computers and Digital Techniques, 1(5), 557–564.

    Article  Google Scholar 

  8. Nordin, G., Milder, P. A., Hoe, J. C., & Püschel, M., (2005). Automatic generation of customized discrete Fourier transform IPs. In Proceedings of the 2005 design automation conference (June).

  9. Bornstein, C. F., Litman, A., Maggs, B. M., Sitaraman, R. K., & Yatzkar, T. (1998). On the bisection width and expansion of butterfly networks. In Proceedings of the 12th international parallel processing symposium (pp. 144–150) (March).

  10. Wang, Z. (1991). Pruning the fast discrete cosine transform. IEEE Transactions on Communications, 39(5), 640–643 (May).

    Article  Google Scholar 

  11. Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B. W., et al. (2005). SPIRAL: Code generation for DSP transforms. In Proceedings of the IEEE, special issue on “Program Generation, Optimization, and Adaptation”, vol. 93(2).

  12. Puschel, M. (2003). Cooley-Tukey FFT like algorithms for the DCT. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol. 2, pp. 501–504 (April).

  13. Takala, J., Akopian, D., Astola, J., & Saarinen, J. (2000). Constant geometry algorithm for discrete cosine transform. IEEE Transactions on Signal Processing, 48(6), 1840–1843.

    Article  MATH  MathSciNet  Google Scholar 

  14. Takala, J. H., Jarvinen, T. S., Salmela, P. V., & Akopian, D. A. (2001). Multi-port interconnection networks for radix-r algorithms. In Proceedings IEEE international conference on acoustics, speech, and signal processing (ICASSP ’01).

  15. Singer, B., & Veloso, M. (2003). Learning to construct fast signal processing implementations. Journal of Machine Learning Research, 3, 887–919.

    Article  MATH  MathSciNet  Google Scholar 

  16. Brodersen, B., Chang, C., Wawrzynek, J., Werthimer, D., & Wright, M. (2004). BEE2: A multi-purpose computing platform for radio telescope digital signal processing applications. In International square kilometre array meeting.

Download references

Acknowledgements

This work has been performed at the University of Puerto Rico at Mayagüez with support from NSF grants CNS − 0424546 and HRD − 9817642.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael A. Arce-Nazario.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arce-Nazario, R.A., Jiménez, M. & Rodríguez, D. Mapping of Discrete Cosine Transforms onto Distributed Hardware Architectures. J Sign Process Syst Sign Image Video Technol 53, 367–382 (2008). https://doi.org/10.1007/s11265-008-0239-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0239-x

Keywords

Navigation