An Optimization Methodology for Designing Hardware-Based Function Evaluation Modules with Reduced Complexity

González-Díaz-Conti, G.; Longoria-Gandara, O.; Carrasco-Alvarez, R.; Ruiz-Ibarra, E.; Castillo-Atoche, A.; Vázquez-Castillo, Javier

doi:10.1007/s00034-021-01835-1

An Optimization Methodology for Designing Hardware-Based Function Evaluation Modules with Reduced Complexity

Published: 11 October 2021

Volume 41, pages 1530–1549, (2022)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

228 Accesses
1 Citation
Explore all metrics

Abstract

The evaluation of mathematical functions is a critical task in several hardware designs, and piecewise polynomial approximation (PPA) is one of the main techniques widely used for function evaluation. This technique employs uniform and non-uniform segmentation for splitting the function and approaching each function segment via polynomial approximation, where lookup tables are needed to store the polynomial coefficients. Today, several hardware-based PPA implementations have address decoder units with a considerable amount of hardware complexity. To face this problem, this paper proposes a new methodology that searches the optimal function segments, polynomial coefficients, and their representation in fixed-point format when the design is constrained to a decoder word-length. Thus, a second-order PPA evaluation architecture based on the Horner’s rule and a simplified address decoder unit with reduced complexity are provided. The proposed methodology uses non-uniform segmentation relying on a linear combination of power-of-two, which results in a reduction of the number of segments and consequently in the hardware complexity of the address decoder unit. The figures of merit employed by the auto-tuning (self-adapting) segmentation process are the first-order derivative (slope) of the function and the signal to quantization noise ratio quality signal metric. Experimental results show a hardware reduction of the decoder design (number of segments, polynomial coefficient ROMs) when is compared with state-of-the-art proposals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Pipelined FFT Hardware Architectures

Article Open access 06 July 2021

Mario Garrido

Performance analysis of multi-folded pipelined successive cancellation decoder architecture for polar code

Article 13 April 2024

Dinesh Kumar D & Shantha Selvakumari R

Open-source design of integrated circuits

Article Open access 09 January 2024

Patrick Fath, Manuel Moser, … Harald Pretl

Availability Statement

The data will be made available under request.

References

S. Aggarwal, P.K. Meher, K. Khare, Concept, design, and implementation of reconfigurable CORDIC. IEEE Trans. Very Large Scale Integr. VLSI Syst. 24(4), 1588–1592 (2016). https://doi.org/10.1109/TVLSI.2015.2445855
Article Google Scholar
C.R. Aguilera, O. Longoria, O.A. Guzman, L.c Pizano, J. Vázquez, IEEE-754 half-precision floating-point low-latency reciprocal square root IP-core, in 2018 IEEE 10th Latin-American Conference on Communications (2018), pp. 1–6. https://doi.org/10.1109/LATINCOM.2018.8613254
C.R. Aguilera-Galicia, O. Longoria-Gandara, L. Pizano-Escalante, J. Vázquez-Castillo, M. Salim-Maza, On-chip implementation of a low-latency bit-accurate reciprocal square root unit. Integration 63, 9–17 (2018). https://doi.org/10.1016/j.vlsi.2018.04.016
Article Google Scholar
F. Albu, J. Kadlec, N. Coleman, A. Fagan, Pipelined implementations of the a Priori Error-Feedback LSL algorithm using logarithmic arithmetic, in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (2002), pp. III-2681-III-2684. https://doi.org/10.1109/ICASSP.2002.5745200
F. Albu et al., Implementation of (Normalised) RLS Lattice on Virtex, in Field-Programmable Logic and Applications: FPL 2001: Lecture Notes in Computer Science, vol. 2147, ed. by G. Brebner, R. Woods (Springer, 2001). https://doi.org/10.1007/3-540-44687-7-10
A. Alimohammad, S.F. Fard, B.F. Cockburn, A unified architecture for the accurate and high-throughput implementation of six key elementary functions. IEEE Trans. Comput. 59(4), 449–456 (2010). https://doi.org/10.1109/TC.2009.169
Article MathSciNet MATH Google Scholar
H. Anton, Elementary Linear Algebra, 9th edn. (Wiley, Hoboken, 2005)
Google Scholar
R. Bellal, E. Lamini, H. Belbachir, S. Tagzout, A. Belouchrani, Improved affine arithmetic-based precision analysis for polynomial function evaluation. IEEE Trans. Comput. 68(5), 702–712 (2019). https://doi.org/10.1109/TC.2018.2882537
Article MathSciNet MATH Google Scholar
F.R. Castillo, J. Cortez, C.A. Gutiérrez, M. Luna, A. Garcia, Extended quadrature spatial modulation for MIMO wireless communications. Phys. Commun. 32, 88–95 (2019). https://doi.org/10.1016/j.phycom.2018.11.006
Article Google Scholar
W.J. Chen, Y.A. Lai, C.A. Shen, The VLSI architecture and implementation of a low complexity and highly efficient configurable SVD processor for MIMO communication systems. Circuits Syst. Signal Process. 39, 6231–6246 (2020). https://doi.org/10.1007/s00034-020-01458-y
Article Google Scholar
P. Chou, Y. Fang, B. Chen ,C. Liu, T. Lin, J. Wang, Near-Threshold CORDIC Design with Dynamic Circuitry for Long-Standby IoT Applications, in 2018 31st IEEE International System-on-Chip Conference (2018), pp. 250–253. https://doi.org/10.1109/SOCC.2018.8618488
J.N. Coleman, E.I. Chester, C.I. Softley, J. Kadlec, Arithmetic on the European logarithmic microprocessor. IEEE Trans. Comput. 49(7), 702–715 (2000). https://doi.org/10.1109/12.863040
Article Google Scholar
T. Deepa, R. Kumar, Performance analysis of\(\mu \)-law companding & SQRT techniques for M-QAM OFDM systems, in 2013 IEEE International Conference ON Emerging Trends in Computing, Communication and Nanotechnology (ICECCN) (2013), pp. 303–307. https://doi.org/10.1109/ICE-CCN.2013.6528513
F.M. Del Campo, A. Morales-Reyes, R. Perez-Andrade, R. Cumplido, A.G. Orozco-Lugo, C. Feregrino, A multi-cycle fixed point square root module for FPGAs. IEICE Electron. Express 9(11), 971–977 (2012). https://doi.org/10.1587/elex.9.971
Article Google Scholar
M. Garrido, P. Källström, M. Kumm, O. Gustafsson, CORDIC II: a new improved CORDIC algorithm. IEEE Trans. Circuits Syst. II Express Briefs 63(2), 186–190 (2016). https://doi.org/10.1109/TCSII.2015.2483422
Article Google Scholar
M. Gopi, G.B.S.R. Naidu, 128 Bit unsigned multiplier design and implementation using an efficient SQRT-CSLA, in 2015 13th International Conference on Electromagnetic Interference and Compatibility (2015), pp. 251–254. https://doi.org/10.1109/INCEMIC.2015.8055889
J. Hormigo, S.D. Muñoz, Efficient floating-point givens rotation unit. Circuits Syst. Signal Process. 40, 2419–2442 (2021). https://doi.org/10.1007/s00034-020-01580-x
Article Google Scholar
S.-F. Hsiao, K.-C. Chen, Y.-H. Chen, Optimization of Lookup Table Size in Table-Bound Design of Function Computation, in 2018 IEEE International Symposium on Circuits and Systems (2018), pp. 1–4. https://doi.org/10.1109/ISCAS.2018.8350933
S.-F. Hsiao, C.-S. Wen, Y.-H. Chen, K.-C. Huang, Hierarchical multipartite function evaluation. IEEE Trans. Comput. 66(1), 89–99 (2017). https://doi.org/10.1109/TC.2016.2574314
Article MathSciNet MATH Google Scholar
H.-J. Ko, S.-F. Hsiao, W.-L. Huang, A new non-uniform segmentation and addressing remapping strategy for hardware-oriented function evaluators based on polynomial approximation, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems (2010), pp. 4153–4156. https://doi.org/10.1109/ISCAS.2010.5537607
U.A. Korat, A.A. Alimohammad, Reconfigurable hardware architecture for principal component analysis. Circuits Syst. Signal Process. 38, 2097–2113 (2019). https://doi.org/10.1007/s00034-018-0953-y
Article Google Scholar
P. Lancaster, M. Tismenetsky, The theory of matrices: with applications. 2. ed., repr. San Diego, Calif.: Acad. Press (2007)
D.-U. Lee, R.C.C. Cheung, W. Luk, J.D. Villasenor, Hierarchical segmentation for hardware function evaluation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 17(1), 103–116 (2009). https://doi.org/10.1109/TVLSI.2008.2003165
Article Google Scholar
D.-U. Lee, J.D. Villasenor, Optimized custom precision function evaluation for embedded processors. IEEE Trans. Comput. 58, 46–59 (2009). https://doi.org/10.1109/TC.2008.124
Article MathSciNet MATH Google Scholar
S.A. Li, C. Li, FPGA implementation of adaptive Kalman filter for industrial ultrasonic applications. Microsyst. Technol. 27, 1611–1618 (2021). https://doi.org/10.1007/s00542-019-04456-6
Article Google Scholar
J.Y.L. Low, C.C. Jong, A memory-efficient tables-and-additions method for accurate computation of elementary functions. IEEE Trans. Comput. 62(5), 858–872 (2013). https://doi.org/10.1109/TC.2012.43
Article MathSciNet MATH Google Scholar
J.M. Muller, Elementary Functions: Algorithms and Implementation, 3rd edn. (Birkhäuser, Boston, 2016)
Book Google Scholar
S.D. Muñoz, J. Hormigo, High-throughput FPGA implementation of QR decomposition. IEEE Trans. Circuits Syst. II Express Briefs 62(9), 861–865 (2015). https://doi.org/10.1109/TCSII.2015.2435753
Article Google Scholar
L. Pizano-Escalante, R. Parra-Michel, J. Vázquez-Castillo, O. Longoria-Gandara, Fast bit-accurate reciprocal square root. Microprocess. Microsyst. 39(2), 74–82 (2015). https://doi.org/10.1016/j.micpro.2015.01.008
Article Google Scholar
R.V.W. Putra, A novel FxP square root algorithm and its digital hardware design. in International Conference on ICT for Smart Society (2013), pp. 1–4. https://doi.org/10.1109/ICTSS.2013.6588110
S. Roy, D.P. Acharya, A.K. Sahoo, Low-complexity architecture of orthogonal matching pursuit based on QR decomposition. IEEE Trans. Very Large Scale Integr. VLSI Syst. 27(7), 1623–1632 (2019). https://doi.org/10.1109/TVLSI.2019.2909754
Article Google Scholar
A.E. Ruiz-Garcia, C.A. Gutierrez, J. Vázquez-Castillo, J. Cortez, SDR-Based Channel Emulator for Vehicular Communications, in 2019 IEEE Colombian Conference on Communications and Computing (COLCOM), Barranquilla, Colombia (2019), pp. 1–6. https://doi.org/10.1109/ColComCon.2019.8809164
T. Sasao, S. Nagayama, J.T. Butler, Numerical function generators using LUT cascades. IEEE Trans. Comput. 56(6), 826–838 (2007). https://doi.org/10.1109/TC.2007.1033
Article MathSciNet MATH Google Scholar
A.H. Shaikh, X. Dang, T. Ahmed et al., MIMO radar array configuration with enhanced degrees of freedom and increased array aperture. Circuits Syst. Signal Process. 40, 375–400 (2021). https://doi.org/10.1007/s00034-020-01478-8
Article Google Scholar
W. Shen, L. Zheng, M. Liu, FPGA implementation of machine learning hardware accelerator for mobile applications of brain-computer interface. DEStech Trans. Comput. Sci. Eng. (2019). https://doi.org/10.12783/dtcse/iteee2019/28781
Article Google Scholar
A.V. Sokolovskiy, V.N. Tyapkin, E.A. Veisov, Y.L. Fateev, The Pipelined QR Decomposition Hardware Architecture Based On Givens Rotation CORDIC Algorithm, in International Siberian Conference on Control and Communications (2019), pp. 1–4. https://doi.org/10.1109/SIBCON.2019.8729615
A.G.M. Strollo, D. De Caro, N. Petra, Elementary functions hardware implementation using constrained piecewise-polynomial approximations. IEEE Trans. Comput. 60(3), 418–432 (2011). https://doi.org/10.1109/TC.2010.127
Article MathSciNet MATH Google Scholar
A.G.M. Strollo, D. De Caro, N. Petra, E. Napoli, V. Garofalo, Constrained piecewise polinomial approximation for hardware implementation of elementary functions, in 2008 15th IEEE International Conference on Electronics, Circuits and Systems (2008), pp. 698–701. https://doi.org/10.1109/ICECS.2008.4674949
A. Tapadar, S. Sarkari, A. Dutta, J. Mehedi, Power and Area Aware Improved SQRT Carry Select Adder (CSIA), in 2018 2nd International Conference on Trends in Electronics and Informatics (2018), pp. 1064–1070. https://doi.org/10.1109/ICOEI.2018.8553702
J.M. Trejo-Arellano, J. Vázquez-Castillo, O. Longoria-Gandara, R. Carrasco-Alvarez, C.A. Gutiérrez, A. Castillo-Atoche, Adaptive segmentation methodology for hardware function evaluators. Comput. Electr. Eng. 69, 194–211 (2018). https://doi.org/10.1016/j.compeleceng.2018.04.024
Article Google Scholar
M. Vázquez, M. Tosini, L. Leiva, Radix-10 restoring square root for 6-input LUTs programmable devices. Circuits Syst. Signal Process. 40, 2335–2360 (2021). https://doi.org/10.1007/s00034-020-01571-y
Article Google Scholar
J. Wu, D. He, Finger Vein Recognition Based on Feature Point Distance, in 2018 IEEE 3rd International Conference on Image, Vision and Computing (2018), pp. 163–167. https://doi.org/10.1109/ICIVC.2018.8492806
A. Younis, S. Sinanovic, M. Di Renzo, R. Mesleh, H. Haas, Generalised sphere decoding for spatial modulation. IEEE Trans. Commun. 61(7), 2805–2815 (2013). https://doi.org/10.1109/TCOMM.2013.061013.120547
Article Google Scholar

Download references

Acknowledgements

The support of the Instituto Tecnológico de Sonora through PROFAPI Project Number 2021_0092 is gratefully acknowledged.

Author information

Authors and Affiliations

Electronics and Electrical Engineering Department, Instituto Tecnológico de Sonora, Cd. Obregón, Sonora, Mexico
G. González-Díaz-Conti & E. Ruiz-Ibarra
Department of Electronics, Systems and IT, Western Institute of Technology and Higher Education, Tlaquepaque, Mexico
O. Longoria-Gandara
Department of Electronics, University of Guadalajara, Guadalajara, Mexico
R. Carrasco-Alvarez
Department of Mechatronics, Autonomous University of Yucatán, Mérida, Mexico
A. Castillo-Atoche
Department of Electrical Engineering, University of Quintana Roo, Chetumal, Mexico
Javier Vázquez-Castillo

Authors

G. González-Díaz-Conti
View author publications
You can also search for this author in PubMed Google Scholar
O. Longoria-Gandara
View author publications
You can also search for this author in PubMed Google Scholar
R. Carrasco-Alvarez
View author publications
You can also search for this author in PubMed Google Scholar
E. Ruiz-Ibarra
View author publications
You can also search for this author in PubMed Google Scholar
A. Castillo-Atoche
View author publications
You can also search for this author in PubMed Google Scholar
Javier Vázquez-Castillo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javier Vázquez-Castillo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

González-Díaz-Conti, G., Longoria-Gandara, O., Carrasco-Alvarez, R. et al. An Optimization Methodology for Designing Hardware-Based Function Evaluation Modules with Reduced Complexity. Circuits Syst Signal Process 41, 1530–1549 (2022). https://doi.org/10.1007/s00034-021-01835-1

Download citation

Received: 12 March 2021
Revised: 24 August 2021
Accepted: 26 August 2021
Published: 11 October 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00034-021-01835-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Optimization Methodology for Designing Hardware-Based Function Evaluation Modules with Reduced Complexity

Abstract

Access this article

Similar content being viewed by others

A Survey on Pipelined FFT Hardware Architectures

Performance analysis of multi-folded pipelined successive cancellation decoder architecture for polar code

Open-source design of integrated circuits

Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Optimization Methodology for Designing Hardware-Based Function Evaluation Modules with Reduced Complexity

Abstract

Access this article

Similar content being viewed by others

A Survey on Pipelined FFT Hardware Architectures

Performance analysis of multi-folded pipelined successive cancellation decoder architecture for polar code

Open-source design of integrated circuits

Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation