Skip to main content
Log in

An Optimization Methodology for Designing Hardware-Based Function Evaluation Modules with Reduced Complexity

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

The evaluation of mathematical functions is a critical task in several hardware designs, and piecewise polynomial approximation (PPA) is one of the main techniques widely used for function evaluation. This technique employs uniform and non-uniform segmentation for splitting the function and approaching each function segment via polynomial approximation, where lookup tables are needed to store the polynomial coefficients. Today, several hardware-based PPA implementations have address decoder units with a considerable amount of hardware complexity. To face this problem, this paper proposes a new methodology that searches the optimal function segments, polynomial coefficients, and their representation in fixed-point format when the design is constrained to a decoder word-length. Thus, a second-order PPA evaluation architecture based on the Horner’s rule and a simplified address decoder unit with reduced complexity are provided. The proposed methodology uses non-uniform segmentation relying on a linear combination of power-of-two, which results in a reduction of the number of segments and consequently in the hardware complexity of the address decoder unit. The figures of merit employed by the auto-tuning (self-adapting) segmentation process are the first-order derivative (slope) of the function and the signal to quantization noise ratio quality signal metric. Experimental results show a hardware reduction of the decoder design (number of segments, polynomial coefficient ROMs) when is compared with state-of-the-art proposals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability Statement

The data will be made available under request.

References

  1. S. Aggarwal, P.K. Meher, K. Khare, Concept, design, and implementation of reconfigurable CORDIC. IEEE Trans. Very Large Scale Integr. VLSI Syst. 24(4), 1588–1592 (2016). https://doi.org/10.1109/TVLSI.2015.2445855

    Article  Google Scholar 

  2. C.R. Aguilera, O. Longoria, O.A. Guzman, L.c Pizano, J. Vázquez, IEEE-754 half-precision floating-point low-latency reciprocal square root IP-core, in 2018 IEEE 10th Latin-American Conference on Communications (2018), pp. 1–6. https://doi.org/10.1109/LATINCOM.2018.8613254

  3. C.R. Aguilera-Galicia, O. Longoria-Gandara, L. Pizano-Escalante, J. Vázquez-Castillo, M. Salim-Maza, On-chip implementation of a low-latency bit-accurate reciprocal square root unit. Integration 63, 9–17 (2018). https://doi.org/10.1016/j.vlsi.2018.04.016

    Article  Google Scholar 

  4. F. Albu, J. Kadlec, N. Coleman, A. Fagan, Pipelined implementations of the a Priori Error-Feedback LSL algorithm using logarithmic arithmetic, in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (2002), pp. III-2681-III-2684. https://doi.org/10.1109/ICASSP.2002.5745200

  5. F. Albu et al., Implementation of (Normalised) RLS Lattice on Virtex, in Field-Programmable Logic and Applications: FPL 2001: Lecture Notes in Computer Science, vol. 2147, ed. by G. Brebner, R. Woods (Springer, 2001). https://doi.org/10.1007/3-540-44687-7-10

  6. A. Alimohammad, S.F. Fard, B.F. Cockburn, A unified architecture for the accurate and high-throughput implementation of six key elementary functions. IEEE Trans. Comput. 59(4), 449–456 (2010). https://doi.org/10.1109/TC.2009.169

    Article  MathSciNet  MATH  Google Scholar 

  7. H. Anton, Elementary Linear Algebra, 9th edn. (Wiley, Hoboken, 2005)

    Google Scholar 

  8. R. Bellal, E. Lamini, H. Belbachir, S. Tagzout, A. Belouchrani, Improved affine arithmetic-based precision analysis for polynomial function evaluation. IEEE Trans. Comput. 68(5), 702–712 (2019). https://doi.org/10.1109/TC.2018.2882537

    Article  MathSciNet  MATH  Google Scholar 

  9. F.R. Castillo, J. Cortez, C.A. Gutiérrez, M. Luna, A. Garcia, Extended quadrature spatial modulation for MIMO wireless communications. Phys. Commun. 32, 88–95 (2019). https://doi.org/10.1016/j.phycom.2018.11.006

    Article  Google Scholar 

  10. W.J. Chen, Y.A. Lai, C.A. Shen, The VLSI architecture and implementation of a low complexity and highly efficient configurable SVD processor for MIMO communication systems. Circuits Syst. Signal Process. 39, 6231–6246 (2020). https://doi.org/10.1007/s00034-020-01458-y

    Article  Google Scholar 

  11. P. Chou, Y. Fang, B. Chen ,C. Liu, T. Lin, J. Wang, Near-Threshold CORDIC Design with Dynamic Circuitry for Long-Standby IoT Applications, in 2018 31st IEEE International System-on-Chip Conference (2018), pp. 250–253. https://doi.org/10.1109/SOCC.2018.8618488

  12. J.N. Coleman, E.I. Chester, C.I. Softley, J. Kadlec, Arithmetic on the European logarithmic microprocessor. IEEE Trans. Comput. 49(7), 702–715 (2000). https://doi.org/10.1109/12.863040

    Article  Google Scholar 

  13. T. Deepa, R. Kumar, Performance analysis of\(\mu \)-law companding & SQRT techniques for M-QAM OFDM systems, in 2013 IEEE International Conference ON Emerging Trends in Computing, Communication and Nanotechnology (ICECCN) (2013), pp. 303–307. https://doi.org/10.1109/ICE-CCN.2013.6528513

  14. F.M. Del Campo, A. Morales-Reyes, R. Perez-Andrade, R. Cumplido, A.G. Orozco-Lugo, C. Feregrino, A multi-cycle fixed point square root module for FPGAs. IEICE Electron. Express 9(11), 971–977 (2012). https://doi.org/10.1587/elex.9.971

    Article  Google Scholar 

  15. M. Garrido, P. Källström, M. Kumm, O. Gustafsson, CORDIC II: a new improved CORDIC algorithm. IEEE Trans. Circuits Syst. II Express Briefs 63(2), 186–190 (2016). https://doi.org/10.1109/TCSII.2015.2483422

    Article  Google Scholar 

  16. M. Gopi, G.B.S.R. Naidu, 128 Bit unsigned multiplier design and implementation using an efficient SQRT-CSLA, in 2015 13th International Conference on Electromagnetic Interference and Compatibility (2015), pp. 251–254. https://doi.org/10.1109/INCEMIC.2015.8055889

  17. J. Hormigo, S.D. Muñoz, Efficient floating-point givens rotation unit. Circuits Syst. Signal Process. 40, 2419–2442 (2021). https://doi.org/10.1007/s00034-020-01580-x

    Article  Google Scholar 

  18. S.-F. Hsiao, K.-C. Chen, Y.-H. Chen, Optimization of Lookup Table Size in Table-Bound Design of Function Computation, in 2018 IEEE International Symposium on Circuits and Systems (2018), pp. 1–4. https://doi.org/10.1109/ISCAS.2018.8350933

  19. S.-F. Hsiao, C.-S. Wen, Y.-H. Chen, K.-C. Huang, Hierarchical multipartite function evaluation. IEEE Trans. Comput. 66(1), 89–99 (2017). https://doi.org/10.1109/TC.2016.2574314

    Article  MathSciNet  MATH  Google Scholar 

  20. H.-J. Ko, S.-F. Hsiao, W.-L. Huang, A new non-uniform segmentation and addressing remapping strategy for hardware-oriented function evaluators based on polynomial approximation, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (2010), pp. 4153–4156. https://doi.org/10.1109/ISCAS.2010.5537607

  21. U.A. Korat, A.A. Alimohammad, Reconfigurable hardware architecture for principal component analysis. Circuits Syst. Signal Process. 38, 2097–2113 (2019). https://doi.org/10.1007/s00034-018-0953-y

    Article  Google Scholar 

  22. P. Lancaster, M. Tismenetsky, The theory of matrices: with applications. 2. ed., repr. San Diego, Calif.: Acad. Press (2007)

  23. D.-U. Lee, R.C.C. Cheung, W. Luk, J.D. Villasenor, Hierarchical segmentation for hardware function evaluation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 17(1), 103–116 (2009). https://doi.org/10.1109/TVLSI.2008.2003165

    Article  Google Scholar 

  24. D.-U. Lee, J.D. Villasenor, Optimized custom precision function evaluation for embedded processors. IEEE Trans. Comput. 58, 46–59 (2009). https://doi.org/10.1109/TC.2008.124

    Article  MathSciNet  MATH  Google Scholar 

  25. S.A. Li, C. Li, FPGA implementation of adaptive Kalman filter for industrial ultrasonic applications. Microsyst. Technol. 27, 1611–1618 (2021). https://doi.org/10.1007/s00542-019-04456-6

    Article  Google Scholar 

  26. J.Y.L. Low, C.C. Jong, A memory-efficient tables-and-additions method for accurate computation of elementary functions. IEEE Trans. Comput. 62(5), 858–872 (2013). https://doi.org/10.1109/TC.2012.43

    Article  MathSciNet  MATH  Google Scholar 

  27. J.M. Muller, Elementary Functions: Algorithms and Implementation, 3rd edn. (Birkhäuser, Boston, 2016)

    Book  Google Scholar 

  28. S.D. Muñoz, J. Hormigo, High-throughput FPGA implementation of QR decomposition. IEEE Trans. Circuits Syst. II Express Briefs 62(9), 861–865 (2015). https://doi.org/10.1109/TCSII.2015.2435753

    Article  Google Scholar 

  29. L. Pizano-Escalante, R. Parra-Michel, J. Vázquez-Castillo, O. Longoria-Gandara, Fast bit-accurate reciprocal square root. Microprocess. Microsyst. 39(2), 74–82 (2015). https://doi.org/10.1016/j.micpro.2015.01.008

    Article  Google Scholar 

  30. R.V.W. Putra, A novel FxP square root algorithm and its digital hardware design. in International Conference on ICT for Smart Society (2013), pp. 1–4. https://doi.org/10.1109/ICTSS.2013.6588110

  31. S. Roy, D.P. Acharya, A.K. Sahoo, Low-complexity architecture of orthogonal matching pursuit based on QR decomposition. IEEE Trans. Very Large Scale Integr. VLSI Syst. 27(7), 1623–1632 (2019). https://doi.org/10.1109/TVLSI.2019.2909754

    Article  Google Scholar 

  32. A.E. Ruiz-Garcia, C.A. Gutierrez, J. Vázquez-Castillo, J. Cortez, SDR-Based Channel Emulator for Vehicular Communications, in 2019 IEEE Colombian Conference on Communications and Computing (COLCOM), Barranquilla, Colombia (2019), pp. 1–6. https://doi.org/10.1109/ColComCon.2019.8809164

  33. T. Sasao, S. Nagayama, J.T. Butler, Numerical function generators using LUT cascades. IEEE Trans. Comput. 56(6), 826–838 (2007). https://doi.org/10.1109/TC.2007.1033

    Article  MathSciNet  MATH  Google Scholar 

  34. A.H. Shaikh, X. Dang, T. Ahmed et al., MIMO radar array configuration with enhanced degrees of freedom and increased array aperture. Circuits Syst. Signal Process. 40, 375–400 (2021). https://doi.org/10.1007/s00034-020-01478-8

    Article  Google Scholar 

  35. W. Shen, L. Zheng, M. Liu, FPGA implementation of machine learning hardware accelerator for mobile applications of brain-computer interface. DEStech Trans. Comput. Sci. Eng. (2019). https://doi.org/10.12783/dtcse/iteee2019/28781

    Article  Google Scholar 

  36. A.V. Sokolovskiy, V.N. Tyapkin, E.A. Veisov, Y.L. Fateev, The Pipelined QR Decomposition Hardware Architecture Based On Givens Rotation CORDIC Algorithm, in International Siberian Conference on Control and Communications (2019), pp. 1–4. https://doi.org/10.1109/SIBCON.2019.8729615

  37. A.G.M. Strollo, D. De Caro, N. Petra, Elementary functions hardware implementation using constrained piecewise-polynomial approximations. IEEE Trans. Comput. 60(3), 418–432 (2011). https://doi.org/10.1109/TC.2010.127

    Article  MathSciNet  MATH  Google Scholar 

  38. A.G.M. Strollo, D. De Caro, N. Petra, E. Napoli, V. Garofalo, Constrained piecewise polinomial approximation for hardware implementation of elementary functions, in 2008 15th IEEE International Conference on Electronics, Circuits and Systems (2008), pp. 698–701. https://doi.org/10.1109/ICECS.2008.4674949

  39. A. Tapadar, S. Sarkari, A. Dutta, J. Mehedi, Power and Area Aware Improved SQRT Carry Select Adder (CSIA), in 2018 2nd International Conference on Trends in Electronics and Informatics (2018), pp. 1064–1070. https://doi.org/10.1109/ICOEI.2018.8553702

  40. J.M. Trejo-Arellano, J. Vázquez-Castillo, O. Longoria-Gandara, R. Carrasco-Alvarez, C.A. Gutiérrez, A. Castillo-Atoche, Adaptive segmentation methodology for hardware function evaluators. Comput. Electr. Eng. 69, 194–211 (2018). https://doi.org/10.1016/j.compeleceng.2018.04.024

    Article  Google Scholar 

  41. M. Vázquez, M. Tosini, L. Leiva, Radix-10 restoring square root for 6-input LUTs programmable devices. Circuits Syst. Signal Process. 40, 2335–2360 (2021). https://doi.org/10.1007/s00034-020-01571-y

    Article  Google Scholar 

  42. J. Wu, D. He, Finger Vein Recognition Based on Feature Point Distance, in 2018 IEEE 3rd International Conference on Image, Vision and Computing (2018), pp. 163–167. https://doi.org/10.1109/ICIVC.2018.8492806

  43. A. Younis, S. Sinanovic, M. Di Renzo, R. Mesleh, H. Haas, Generalised sphere decoding for spatial modulation. IEEE Trans. Commun. 61(7), 2805–2815 (2013). https://doi.org/10.1109/TCOMM.2013.061013.120547

    Article  Google Scholar 

Download references

Acknowledgements

The support of the Instituto Tecnológico de Sonora through PROFAPI Project Number 2021_0092 is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Vázquez-Castillo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

González-Díaz-Conti, G., Longoria-Gandara, O., Carrasco-Alvarez, R. et al. An Optimization Methodology for Designing Hardware-Based Function Evaluation Modules with Reduced Complexity. Circuits Syst Signal Process 41, 1530–1549 (2022). https://doi.org/10.1007/s00034-021-01835-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-021-01835-1

Keywords

Navigation