Abstract
The digital transforms are intensive in multiplication and accumulation operations which have a high computational cost. Advances in computer arithmetic and digital technologies allow simplifying the processing of complex algorithms when they are implemented in modern circuits. New computation techniques can be explored to provide efficient operational methods for implementing algorithms that avoid much of the complex and costly mathematical operations. This work aims to design a high-performance architecture for computing some common digital transforms. The proposed architecture has been compared to other methods. The transform used as example in this work is the discrete cosine transform. The results show that the proposal offers high-performance results comparable or better than best-known methods.






Similar content being viewed by others
References
Mora H, Signes-Pont MT, Azorín-López J, Corral Sánchez L (2015) High-speed architecture for direct computation of DCT In: International Conference on systems, control, signal processing and informatics, pp 176–183
Sungwook Y, Swartziander EE (2001) DCT implementation with distributed arithmetic. IEEE Trans Comput 50(9):985–991. https://doi.org/10.1109/12.954513
Shams AM, Chidanandan A, Pan A, Bayoumi MA (2006) NEDA: a low-power high-performance DCT architecture. IEEE Trans Signal Process 54(3):955–964. https://doi.org/10.1109/TSP.2005.862755
Sharma VK, Mahapatra KK, Pati UC (2011) An efficient distributed arithmetic based VLSI architecture for DCT. In: International conference on devices and communications, pp 1–5. https://doi.org/10.1109/icdecom.2011.5738484
Bernabé G, Hernández R, Acacio ME (2016) Parallel implementations of the 3D fast wavelet transform on a Raspberry Pi 2 cluster. J Supercomput. https://doi.org/10.1007/s11227-016-1933-2
Chen WH, Smith C, Fralick S (1977) A fast computational algorithm for the Discrete Cosine Transform. IEEE Trans Commun 25(9):1004–1009. https://doi.org/10.1109/TCOM.1977.1093941
Vetterli M, Kovacevic J (2013) Wavelets and subband coding. CreateSpace Independent Publishing Platform. ISBN: 978-1484886991
Loeffler C, Lightenberg A, Moschytz GS (1989) Practical fast 1-D DCT algorithms with 11-multiplications. Proc of ICASSP Glagow 2:988–991. https://doi.org/10.1109/ICASSP.1989.266596
El Aakif M, Belkouch S, Chabini N, Hassani MM (2011) Low power and fast DCT architecture using multiplier-less method. Faible Tension Faible Consomm. https://doi.org/10.1109/FTFC.2011.5948920
Liang J, Tran TD (2001) Fast multiplierless approximations of the DCT with the lifting scheme. IEEE Trans Signal Process 49(12):3032–3044. https://doi.org/10.1109/78.969511
Huang H, Xiao L (2013) CORDIC based fast Radix-2 DCT algorithm. IEEE Signal Process Lett 20(5):483–486. https://doi.org/10.1109/LSP.2013.2252616
Ghodhbani R, Saidani T, Horrigue L et al (2017) An efficient pass-parallel architecture for embedded block coder in JPEG 2000. Real-Time Image Proc. https://doi.org/10.1007/s11554-017-0666-7
Signes MT et al (2009) Improvement of the Discrete Cosine Transform calculation by means of a recursive method. Math Comput Model 50:750–764. https://doi.org/10.1016/j.mcm.2009.05.004
Xie J, Meher PK, He J (2013) Hardware-efficient realization of prime-length DCT based on distributed arithmetic. IEEE Trans Comput 62(6):1170–1178. https://doi.org/10.1109/TC.2012.64
Coutinho VA et al (2016) A multiplierless pruned DCT-like transformation for image and video compression that requires ten additions only. J Real-Time Image Process 12(2):247–255. https://doi.org/10.1007/s11554-015-0492-8
Revathi KG, Reeja Malar J (2016) Efficient diagonal data mapping for large size 2D DCT/IDCT using single port SRAM based transpose memory. Conf Electr Electron Optim Tech, Int. https://doi.org/10.1109/ICEEOT.2016.7755651
ISO/IEC (1994) Information technology-digital compression and coding of continuous-tone still images-requirements and guidelines. ISO 81: 09-92, 1994
Mora-Mora H, Mora-Pascual J, Sánchez-Romero JL, García-Chamizo JM (2008) Partial product reduction by using look-up tables for M × N multiplier. Integr VLSI J 41(4):557–571. https://doi.org/10.1016/j.vlsi.2008.01.005
Tanaka Y (2016) Efficient signed-digit-to-canonical-signed-digit recoding circuits. Microelectron J 57:21–25. https://doi.org/10.1016/j.mejo.2016.09.001
Mora H et al (2017) Mathematical model and implementation of rational processing. J Comput Appl Math 309:575–586. https://doi.org/10.1016/j.cam.2016.05.001
Ghasemzadeh M, Mahdavi S, Zokaei A, Hadidi K (2016) A new ultra high speed 5-2 compressor with a new structure. In: International conference on mixed design of integrated circuits and systems, pp 151–154. https://doi.org/10.1109/mixdes.2016.7529721
Mora H et al (2010) Mathematical model of stored logic based computation. Math Comput Model 52(7–8):1243–1250. https://doi.org/10.1016/j.mcm.2010.02.034
IEEE Std. 1180-1990 (1990) IEEE standard specification for the implementation of 8 × 8 inverse cosine transform. Institute of Electrical and Electronics Engineers, International Standard, New York, USA
Joshi A et al (2017) A comparative performance analysis of various CMOS design techniques for XOR and XNOR circuits. Int J Res Appl Sci Eng. https://doi.org/10.22214/ijraset.2017.4241
Bernardi P, Restifo M, Sánchez E, Sonza Reorda M (2017) On the in-field test of embedded memories. International symposium on on-line testing and robust system design (IOLTS). https://doi.org/10.1109/IOLTS.2017.8046236
Acknowledgements
This work was supported by the Spanish Research Agency (AEI) and the European Regional Development Fund (FEDER) under project “CloudDriver4Industry” TIN2017-89266-R.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mora, H., Signes-Pont, M.T., Jimeno-Morenilla, A. et al. High-performance architecture for digital transform processing. J Supercomput 75, 1336–1349 (2019). https://doi.org/10.1007/s11227-018-2436-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2436-0