Abstract
The evaluation of mathematical functions is a critical task in several hardware designs, and piecewise polynomial approximation (PPA) is one of the main techniques widely used for function evaluation. This technique employs uniform and non-uniform segmentation for splitting the function and approaching each function segment via polynomial approximation, where lookup tables are needed to store the polynomial coefficients. Today, several hardware-based PPA implementations have address decoder units with a considerable amount of hardware complexity. To face this problem, this paper proposes a new methodology that searches the optimal function segments, polynomial coefficients, and their representation in fixed-point format when the design is constrained to a decoder word-length. Thus, a second-order PPA evaluation architecture based on the Horner’s rule and a simplified address decoder unit with reduced complexity are provided. The proposed methodology uses non-uniform segmentation relying on a linear combination of power-of-two, which results in a reduction of the number of segments and consequently in the hardware complexity of the address decoder unit. The figures of merit employed by the auto-tuning (self-adapting) segmentation process are the first-order derivative (slope) of the function and the signal to quantization noise ratio quality signal metric. Experimental results show a hardware reduction of the decoder design (number of segments, polynomial coefficient ROMs) when is compared with state-of-the-art proposals.
Similar content being viewed by others
Availability Statement
The data will be made available under request.
References
S. Aggarwal, P.K. Meher, K. Khare, Concept, design, and implementation of reconfigurable CORDIC. IEEE Trans. Very Large Scale Integr. VLSI Syst. 24(4), 1588–1592 (2016). https://doi.org/10.1109/TVLSI.2015.2445855
C.R. Aguilera, O. Longoria, O.A. Guzman, L.c Pizano, J. Vázquez, IEEE-754 half-precision floating-point low-latency reciprocal square root IP-core, in 2018 IEEE 10th Latin-American Conference on Communications (2018), pp. 1–6. https://doi.org/10.1109/LATINCOM.2018.8613254
C.R. Aguilera-Galicia, O. Longoria-Gandara, L. Pizano-Escalante, J. Vázquez-Castillo, M. Salim-Maza, On-chip implementation of a low-latency bit-accurate reciprocal square root unit. Integration 63, 9–17 (2018). https://doi.org/10.1016/j.vlsi.2018.04.016
F. Albu, J. Kadlec, N. Coleman, A. Fagan, Pipelined implementations of the a Priori Error-Feedback LSL algorithm using logarithmic arithmetic, in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (2002), pp. III-2681-III-2684. https://doi.org/10.1109/ICASSP.2002.5745200
F. Albu et al., Implementation of (Normalised) RLS Lattice on Virtex, in Field-Programmable Logic and Applications: FPL 2001: Lecture Notes in Computer Science, vol. 2147, ed. by G. Brebner, R. Woods (Springer, 2001). https://doi.org/10.1007/3-540-44687-7-10
A. Alimohammad, S.F. Fard, B.F. Cockburn, A unified architecture for the accurate and high-throughput implementation of six key elementary functions. IEEE Trans. Comput. 59(4), 449–456 (2010). https://doi.org/10.1109/TC.2009.169
H. Anton, Elementary Linear Algebra, 9th edn. (Wiley, Hoboken, 2005)
R. Bellal, E. Lamini, H. Belbachir, S. Tagzout, A. Belouchrani, Improved affine arithmetic-based precision analysis for polynomial function evaluation. IEEE Trans. Comput. 68(5), 702–712 (2019). https://doi.org/10.1109/TC.2018.2882537
F.R. Castillo, J. Cortez, C.A. Gutiérrez, M. Luna, A. Garcia, Extended quadrature spatial modulation for MIMO wireless communications. Phys. Commun. 32, 88–95 (2019). https://doi.org/10.1016/j.phycom.2018.11.006
W.J. Chen, Y.A. Lai, C.A. Shen, The VLSI architecture and implementation of a low complexity and highly efficient configurable SVD processor for MIMO communication systems. Circuits Syst. Signal Process. 39, 6231–6246 (2020). https://doi.org/10.1007/s00034-020-01458-y
P. Chou, Y. Fang, B. Chen ,C. Liu, T. Lin, J. Wang, Near-Threshold CORDIC Design with Dynamic Circuitry for Long-Standby IoT Applications, in 2018 31st IEEE International System-on-Chip Conference (2018), pp. 250–253. https://doi.org/10.1109/SOCC.2018.8618488
J.N. Coleman, E.I. Chester, C.I. Softley, J. Kadlec, Arithmetic on the European logarithmic microprocessor. IEEE Trans. Comput. 49(7), 702–715 (2000). https://doi.org/10.1109/12.863040
T. Deepa, R. Kumar, Performance analysis of\(\mu \)-law companding & SQRT techniques for M-QAM OFDM systems, in 2013 IEEE International Conference ON Emerging Trends in Computing, Communication and Nanotechnology (ICECCN) (2013), pp. 303–307. https://doi.org/10.1109/ICE-CCN.2013.6528513
F.M. Del Campo, A. Morales-Reyes, R. Perez-Andrade, R. Cumplido, A.G. Orozco-Lugo, C. Feregrino, A multi-cycle fixed point square root module for FPGAs. IEICE Electron. Express 9(11), 971–977 (2012). https://doi.org/10.1587/elex.9.971
M. Garrido, P. Källström, M. Kumm, O. Gustafsson, CORDIC II: a new improved CORDIC algorithm. IEEE Trans. Circuits Syst. II Express Briefs 63(2), 186–190 (2016). https://doi.org/10.1109/TCSII.2015.2483422
M. Gopi, G.B.S.R. Naidu, 128 Bit unsigned multiplier design and implementation using an efficient SQRT-CSLA, in 2015 13th International Conference on Electromagnetic Interference and Compatibility (2015), pp. 251–254. https://doi.org/10.1109/INCEMIC.2015.8055889
J. Hormigo, S.D. Muñoz, Efficient floating-point givens rotation unit. Circuits Syst. Signal Process. 40, 2419–2442 (2021). https://doi.org/10.1007/s00034-020-01580-x
S.-F. Hsiao, K.-C. Chen, Y.-H. Chen, Optimization of Lookup Table Size in Table-Bound Design of Function Computation, in 2018 IEEE International Symposium on Circuits and Systems (2018), pp. 1–4. https://doi.org/10.1109/ISCAS.2018.8350933
S.-F. Hsiao, C.-S. Wen, Y.-H. Chen, K.-C. Huang, Hierarchical multipartite function evaluation. IEEE Trans. Comput. 66(1), 89–99 (2017). https://doi.org/10.1109/TC.2016.2574314
H.-J. Ko, S.-F. Hsiao, W.-L. Huang, A new non-uniform segmentation and addressing remapping strategy for hardware-oriented function evaluators based on polynomial approximation, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (2010), pp. 4153–4156. https://doi.org/10.1109/ISCAS.2010.5537607
U.A. Korat, A.A. Alimohammad, Reconfigurable hardware architecture for principal component analysis. Circuits Syst. Signal Process. 38, 2097–2113 (2019). https://doi.org/10.1007/s00034-018-0953-y
P. Lancaster, M. Tismenetsky, The theory of matrices: with applications. 2. ed., repr. San Diego, Calif.: Acad. Press (2007)
D.-U. Lee, R.C.C. Cheung, W. Luk, J.D. Villasenor, Hierarchical segmentation for hardware function evaluation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 17(1), 103–116 (2009). https://doi.org/10.1109/TVLSI.2008.2003165
D.-U. Lee, J.D. Villasenor, Optimized custom precision function evaluation for embedded processors. IEEE Trans. Comput. 58, 46–59 (2009). https://doi.org/10.1109/TC.2008.124
S.A. Li, C. Li, FPGA implementation of adaptive Kalman filter for industrial ultrasonic applications. Microsyst. Technol. 27, 1611–1618 (2021). https://doi.org/10.1007/s00542-019-04456-6
J.Y.L. Low, C.C. Jong, A memory-efficient tables-and-additions method for accurate computation of elementary functions. IEEE Trans. Comput. 62(5), 858–872 (2013). https://doi.org/10.1109/TC.2012.43
J.M. Muller, Elementary Functions: Algorithms and Implementation, 3rd edn. (Birkhäuser, Boston, 2016)
S.D. Muñoz, J. Hormigo, High-throughput FPGA implementation of QR decomposition. IEEE Trans. Circuits Syst. II Express Briefs 62(9), 861–865 (2015). https://doi.org/10.1109/TCSII.2015.2435753
L. Pizano-Escalante, R. Parra-Michel, J. Vázquez-Castillo, O. Longoria-Gandara, Fast bit-accurate reciprocal square root. Microprocess. Microsyst. 39(2), 74–82 (2015). https://doi.org/10.1016/j.micpro.2015.01.008
R.V.W. Putra, A novel FxP square root algorithm and its digital hardware design. in International Conference on ICT for Smart Society (2013), pp. 1–4. https://doi.org/10.1109/ICTSS.2013.6588110
S. Roy, D.P. Acharya, A.K. Sahoo, Low-complexity architecture of orthogonal matching pursuit based on QR decomposition. IEEE Trans. Very Large Scale Integr. VLSI Syst. 27(7), 1623–1632 (2019). https://doi.org/10.1109/TVLSI.2019.2909754
A.E. Ruiz-Garcia, C.A. Gutierrez, J. Vázquez-Castillo, J. Cortez, SDR-Based Channel Emulator for Vehicular Communications, in 2019 IEEE Colombian Conference on Communications and Computing (COLCOM), Barranquilla, Colombia (2019), pp. 1–6. https://doi.org/10.1109/ColComCon.2019.8809164
T. Sasao, S. Nagayama, J.T. Butler, Numerical function generators using LUT cascades. IEEE Trans. Comput. 56(6), 826–838 (2007). https://doi.org/10.1109/TC.2007.1033
A.H. Shaikh, X. Dang, T. Ahmed et al., MIMO radar array configuration with enhanced degrees of freedom and increased array aperture. Circuits Syst. Signal Process. 40, 375–400 (2021). https://doi.org/10.1007/s00034-020-01478-8
W. Shen, L. Zheng, M. Liu, FPGA implementation of machine learning hardware accelerator for mobile applications of brain-computer interface. DEStech Trans. Comput. Sci. Eng. (2019). https://doi.org/10.12783/dtcse/iteee2019/28781
A.V. Sokolovskiy, V.N. Tyapkin, E.A. Veisov, Y.L. Fateev, The Pipelined QR Decomposition Hardware Architecture Based On Givens Rotation CORDIC Algorithm, in International Siberian Conference on Control and Communications (2019), pp. 1–4. https://doi.org/10.1109/SIBCON.2019.8729615
A.G.M. Strollo, D. De Caro, N. Petra, Elementary functions hardware implementation using constrained piecewise-polynomial approximations. IEEE Trans. Comput. 60(3), 418–432 (2011). https://doi.org/10.1109/TC.2010.127
A.G.M. Strollo, D. De Caro, N. Petra, E. Napoli, V. Garofalo, Constrained piecewise polinomial approximation for hardware implementation of elementary functions, in 2008 15th IEEE International Conference on Electronics, Circuits and Systems (2008), pp. 698–701. https://doi.org/10.1109/ICECS.2008.4674949
A. Tapadar, S. Sarkari, A. Dutta, J. Mehedi, Power and Area Aware Improved SQRT Carry Select Adder (CSIA), in 2018 2nd International Conference on Trends in Electronics and Informatics (2018), pp. 1064–1070. https://doi.org/10.1109/ICOEI.2018.8553702
J.M. Trejo-Arellano, J. Vázquez-Castillo, O. Longoria-Gandara, R. Carrasco-Alvarez, C.A. Gutiérrez, A. Castillo-Atoche, Adaptive segmentation methodology for hardware function evaluators. Comput. Electr. Eng. 69, 194–211 (2018). https://doi.org/10.1016/j.compeleceng.2018.04.024
M. Vázquez, M. Tosini, L. Leiva, Radix-10 restoring square root for 6-input LUTs programmable devices. Circuits Syst. Signal Process. 40, 2335–2360 (2021). https://doi.org/10.1007/s00034-020-01571-y
J. Wu, D. He, Finger Vein Recognition Based on Feature Point Distance, in 2018 IEEE 3rd International Conference on Image, Vision and Computing (2018), pp. 163–167. https://doi.org/10.1109/ICIVC.2018.8492806
A. Younis, S. Sinanovic, M. Di Renzo, R. Mesleh, H. Haas, Generalised sphere decoding for spatial modulation. IEEE Trans. Commun. 61(7), 2805–2815 (2013). https://doi.org/10.1109/TCOMM.2013.061013.120547
Acknowledgements
The support of the Instituto Tecnológico de Sonora through PROFAPI Project Number 2021_0092 is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
González-Díaz-Conti, G., Longoria-Gandara, O., Carrasco-Alvarez, R. et al. An Optimization Methodology for Designing Hardware-Based Function Evaluation Modules with Reduced Complexity. Circuits Syst Signal Process 41, 1530–1549 (2022). https://doi.org/10.1007/s00034-021-01835-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-021-01835-1