Radix-10 Restoring Square Root for 6-input LUTs Programmable Devices

Vázquez, Martín; Tosini, Marcelo; Leiva, Lucas

doi:10.1007/s00034-020-01571-y

Radix-10 Restoring Square Root for 6-input LUTs Programmable Devices

Published: 28 October 2020

Volume 40, pages 2335–2360, (2021)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

156 Accesses
2 Citations
Explore all metrics

Abstract

This paper proposes efficient fixed-point and floating-point implementations for radix-10 square root in Xilinx FPGAs devices. The method implements digit recurrence with restoring algorithm, which supports the three decimal floating-point (DFP) types specified in the IEEE 754-2008 standard. The technique used for restoring is optimal and novel. The designs use new techniques based on the efficient utilization of dedicated resources in the programmable devices. Implementations were made in Xilinx 7-series devices. For fixed-point square root, they are capable of operating up to 212 MHz for p=7, 197 MHz for p=16, and 190 MHz for p=34. As for DFP square root, the operation frequency obtained is 194 MHz for p=7, 183 MHz for p=16, and 174 MHz for p=34. The proposed architecture achieves better computation times than related works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design of a reversible ALU using a novel coplanar reversible full adder and MF gate in QCA nanotechnology

Article 06 January 2023

A Survey on Pipelined FFT Hardware Architectures

Article Open access 06 July 2021

Design of an ultra-high-speed coplanar QCA reversible ALU with a novel coplanar reversible full adder based on MTSG

Article 31 May 2023

References

A. Amaricai, O. Boncalo, Fpga implementation of very high radix square root with prescaling, in 2012 19th IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2012), pp. 221–224. IEEE (2012)
J.M. Anderson, C. Tsen, L.K. Wang, K. Compton, J.M. Schulte, Performance analysis of decimal floating-point libraries and its impact on decimal hardware and software solutions, in 2009 IEEE International Conference on Computer Design, pp. 465–471. IEEE (2009)
F. Batista, Decimal data type. World Wide Web. http://www.python.org/dev/peps/pep-0327, version 62268 (2003). Accessed 20 Feb 2020
M. Bhat, J. Crawford, R. Morin, K. Shiv, Performance characterization of decimal arithmetic in commercial java workloads, in 2007 IEEE International Symposium on Performance Analysis of Systems & Software, pp. 54–61. IEEE (2007)
P. Corsonello, S. Perri, High performance square rooting circuit using hybrid radix-2 adders. Electron. Lett. 35(3), 185–186 (1999)
Article Google Scholar
M. Cowlishaw, The decnumber library, v3. 68. World Wide Web http://speleotrove.com/decimal/decnumber.pdf (2010). Accessed 2 Feb 2020
M.F. Cowlishaw, Decimal floating-point: Algorism for computers. In: Proceedings 2003 16th IEEE Symposium on Computer Arithmetic, pp. 104–111. IEEE (2003)
P. Crismer, Eiffel decimal arithmetic library. World Wide Web. http://www.gobosoft.com/eiffel/gobo/math/decimal/index.html (2019). Accessed 18 Feb 2020
D. Currie, Lua decnumber library. World Wide Web. http://files.luaforge.net/releases/ldecnumber/ldecnumber/ldecNumber-21, version 21 (2007). Accessed 19 Jan 2020
F. De Dinechin, M. Joldes, B. Pasca, G. Revy, Multiplicative square root algorithms for fpgas, in 2010 International Conference on Field Programmable Logic and Applications, pp. 574–577. IEEE (2010)
N. Dlodlo, M. Mofolo, L. Masoane, S. Mncwabe, G. Sibiya, L. Mboweni, Research trends in existing technologies that are building blocks to the internet of things. in Innovations and Advances in Computing, Informatics, Systems Sciences, Networking and Engineering, pp. 539–548. Springer (2015)
A.Y. Duale, M.H. Decker, H.G. Zipperer, M. Aharoni, T.J. Bohizic, Decimal floating-point in z9: an implementation and testing perspective. IBM J. Res. Develop. 51(1.2), 217–227 (2007)
Article Google Scholar
L. Eisen, J. Ward, H.W. Tast, N. Mading, J. Leenstra, S.M. Mueller, C. Jacobi, J. Preiss, E.M. Schwarz, S.R. Carlough, Ibm power6 accelerators: Vmx and dfu. IBM J. Res. Develop. 51(6), 1–21 (2007)
Article Google Scholar
M. Ercegovac, J.M. Muller, Digit-recurrence algorithms for division and square root with limited precision primitives, in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 1440–1444. IEEE (2003)
M.D. Ercegovac, T. Lang, Digital Arithmetic (Elsevier, Amsterdam, 2004)
Google Scholar
M.D. Ercegovac, R. McIlhenny, Design and fpga implementation of radix-10 algorithm for square root with limited precision primitives, in 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers, pp. 935–939. IEEE (2009)
M.D. Ercegovac, R. McIlhenny, Design and fpga implementation of radix-10 combined division/square root algorithm with limited precision primitives, in 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers, pp. 87–91. IEEE (2010)
M.D. Ercegovac, R. McIlhenny, Shared implementation of radix-10 and radix-16 square root algorithm with limited precision primitives, in 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp. 345–349. IEEE (2012)
J. Fandrianto, Algorithm for high speed shared radix 4 division and radix 4 square-root, in 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH), pp. 73–79. IEEE (1987)
J. Fandrianto, Algorithm for high speed shared radix 8 division and radix 8 square root, in Proceedings of 9th Symposium on Computer Arithmetic, pp. 68–75 (1989)
A. Hosseiny, G. Jaberipur, Decimal square root: algorithm and hardware implementation. Circuits Syst. Sig. Process. 35(12), 4195–4219 (2016)
Article MathSciNet Google Scholar
IEEE: Ieee standard for floating-point arithmetic. IEEE Std 754-2008 pp. 1–70 (2008). https://doi.org/10.1109/IEEESTD.2008.4610935
A. Jena, S.K. Panda, Fpga-vhdl implementation of pipelined square root circuit for vlsi signal processing applications. Int. J. Comp. Appl. 975, 8887 (2016)
Google Scholar
K. Jun, E.E. Swartzlander, Improved non-restoring square root algorithm with dual path calculation, in 2014 48th Asilomar Conference on Signals, Systems and Computers, pp. 1243–1246. IEEE (2014)
H. Kabuo, T. Taniguchi, A. Miyoshi, H. Yamashita, M. Urano, H. Edamatsu, S. Kuninobu, Accurate rounding scheme for the newton-raphson method using redundant binary representation. IEEE Trans. Comp. 43(1), 43–51 (1994)
Article Google Scholar
A. Kaivani, S.B. Ko, Decimal srt square root: algorithm and architecture. Circuits Syst. Sig. Process. 32(5), 2137–2150 (2013)
Article Google Scholar
M. Kavis, The internet of things will radically change your big data strategy (2014). http://www.forbes.com/sites/mikekavis/2014/06/26/the-internet-of-things-will-radically-change-your-big-data-strategy/. Accessed 17 Feb 2020
T.J. Kwon, J. Draper, Floating-point division and square root implementation using a taylor-series expansion algorithm with reduced look-up tables, in: 2008 51st Midwest Symposium on Circuits and Systems, pp. 954–957. IEEE (2008)
LabSET: Vhdl implementation of radix-10 restoring square root (2019). https://github.com/LabSET-UNICEN/radix-10_sqrt
S. Lachowicz, H.J. Pfleiderer, Fast evaluation of the square root and other nonlinear functions in fpga, in 4th IEEE International Symposium on Electronic Design, Test and Applications (delta 2008), pp. 474–477. IEEE (2008)
T. Lang, P. Montuschi, Very-high radix combined division and square root with prescaling and selection by rounding, in Proceedings of the 12th Symposium on Computer Arithmetic, pp. 124–131. IEEE (1995)
T. Lang, P. Montuschi, Very high radix square root with prescaling and rounding and a combined division/square root unit. IEEE Trans. Comp. 48(8), 827–841 (1999)
Article MathSciNet Google Scholar
Y. Li, W. Chu, A new non-restoring square root algorithm and its vlsi implementations, in:Proceedings International Conference on Computer Design. VLSI in Computers and Processors, pp. 538–544. IEEE (1996)
Y. Li, W. Chu, Implementation of single precision floating point square root on fpgas, in Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No. 97TB100186), pp. 226–232. IEEE (1997)
Y. Li, W. Chu, Parallel-array implementations of a non-restoring square root algorithm, in Proceedings International Conference on Computer Design VLSI in Computers and Processors, pp. 690–695. IEEE (1997)
S.E. McQuillan, J.V. McCanny, R. Hamill, New algorithms and vlsi architectures for srt division and square root, in Proceedings of IEEE 11th Symposium on Computer Arithmetic, pp. 80–86. IEEE (1993)
P. Montuschi, L. Ciminiera, Reducing iteration time when result digit is zero for radix 2 srt division and square root with redundant remainders. IEEE Trans. Comp. 42(2), 239–246 (1993)
Article Google Scholar
A. Nannarelli, Decimal engine for energy-efficient multicore processors, in 2014 22nd International Conference on Very Large Scale Integration (VLSI-SoC), pp. 1–6. IEEE (2014)
B. Parhami, Computer arithmetic: algorithms and hardware designsComputer arithmetic: Algorithms and hardware designs (Oxford University Press, Oxford, OxfordOxford, 2000), pp. 512583–512585
Google Scholar
J.A. Pineiro, J.D. Bruguera, High-speed double-precision computation of reciprocal, division, square root, and inverse square root. IEEE Trans. Comp. 51(12), 1377–1388 (2002)
Article MathSciNet Google Scholar
A. Rahman et al., New efficient hardware design methodology for modified non-restoring square root algorithm. In: 2014 International Conference on Informatics, Electronics & Vision (ICIEV), pp. 1–6. IEEE (2014)
C.V. Ramamoorthy, J.R. Goodman, K. Kim, Some properties of iterative square-rooting methods using high-speed multiplication. IEEE Trans. Comp. 100(8), 837–847 (1972)
Article Google Scholar
I. Sajid, M. Ahmed, S.G. Ziavras, Novel pipelined architecture for efficient evaluation of the square root using a modified non-restoring algorithm. J. Sig. Process. Syst. 67(2), 157–166 (2012)
Article Google Scholar
M.J. Schulte, N. Lindberg, A. Laxminarain, Performance evaluation of decimal floating-point arithmetic, in: Proceedings of the 6th IBM Austin Center for Advanced Studies Conference (2005)
E.M. Schwarz, J.S. Kapernick, M.F. Cowlishaw, Decimal floating-point support on the ibm system z10 processor. IBM J. Res. Develop. 53(1), 1–4 (2009)
Article Google Scholar
S. Suresh, S.F. Beldianu, S.G. Ziavras, Fpga and asic square root designs for high performance and power efficiency, in 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, pp. 269–272. IEEE (2013)
N. Takagi, K. Takagi, A vlsi algorithm for integer square-rooting, in 2006 International Symposium on Intelligent Signal Processing and Communications, pp. 626–629. IEEE (2006)
A.J. Thakkar, A. Ejnioui, Design and implementation of double precision floating point division and square root on fpgas, in 2006 IEEE Aerospace Conference, pp. 7–pp. IEEE (2006)
Á. Vázquez, J.D. Bruguera, Iterative algorithm and architecture for exponential, logarithm, powering, and root extraction. IEEE Trans. Comp. 62(9), 1721–1731 (2012)
Article MathSciNet Google Scholar
A. Vázquez, F. de Dinechin, Efficient implementation of parallel bcd multiplication in lut-6 fpgas, in 2010 International Conference on Field-Programmable Technology, pp. 126–133. IEEE (2010)
A. Vazquez, F. de Dinechin, Multi-operand decimal tree adders for fpgas. Research Report (2010)
M. Vázquez, L. Leiva, G. Sutter, Radix-10 decimal logarithm by direct selection for 6-input luts programmable devices. Microprocess. Microsyst. 64, 143–158 (2019)
Article Google Scholar
M. Vázquez, E. Todorovich, Fpga-specific decimal sign-magnitude addition and subtraction. Int. J. Electron. 103(7), 1166–1185 (2016)
Article Google Scholar
M. Vázquez, M. Tosini, Design and implementation of decimal fixed-point square root in lut-6 fpgas, in 2014 IX Southern Conference on Programmable Logic (SPL), pp. 1–6. IEEE (2014)
K. Vijeyakumar, V. Sumathy, P. Vasakipriya, A.D. Babu, Fpga implementation of low power high speed square root circuits, in 2012 IEEE International Conference on Computational Intelligence and Computing Research, pp. 1–5. IEEE (2012)
L.K. Wang, M.J. Schulte, Decimal floating-point square root using newton-raphson iteration, in 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP’05), pp. 309–315. IEEE (2005)
X. Wang, B.E. Nelson, Tradeoffs of designing floating-point division and square root on virtex fpgas, in 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003., pp. 195–203. IEEE (2003)
Xilinx: Ise design suite 14: Release notes, installation and licensing (2013). www.xilinx.com
Xilinx Xst user guide for virtex-6, spartan-6 and 7 series devices (2013). www.xilinx.com
Xilinx 7 series fpgas configuration—user guides (2016). www.xilinx.com
Xilinx Ultrascale architecture and product data sheet: Overview (2018). www.xilinx.com
Xilinx Vivado design suite user guide—design flows overview (2018). www.xilinx.com
Xilinx Vivado design suite user guide—synthesis (2019). www.xilinx.com
B. Yang, D. Wang, L. Liu, Complex division and square-root using cordic, in 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), pp. 2464–2468. IEEE (2012)

Download references

Acknowledgements

This work was partially supported from investigation projects fund provides by the Research Secretary of the Faculty of Engineering of FASTA University and SeCAT of UNICEN University.

The data that support the findings of this study are openly available in GitHub at https://github.com/LabSET-UNICEN/radix-10_sqrt, reference number 29.

Author information

Authors and Affiliations

Computer and Systems Department, UNICEN, Tandil, Argentina
Martín Vázquez, Marcelo Tosini & Lucas Leiva

Authors

Martín Vázquez
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Tosini
View author publications
You can also search for this author in PubMed Google Scholar
Lucas Leiva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucas Leiva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vázquez, M., Tosini, M. & Leiva, L. Radix-10 Restoring Square Root for 6-input LUTs Programmable Devices. Circuits Syst Signal Process 40, 2335–2360 (2021). https://doi.org/10.1007/s00034-020-01571-y

Download citation

Received: 02 March 2020
Revised: 03 October 2020
Accepted: 10 October 2020
Published: 28 October 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s00034-020-01571-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Radix-10 Restoring Square Root for 6-input LUTs Programmable Devices

Abstract

Access this article

Similar content being viewed by others

Design of a reversible ALU using a novel coplanar reversible full adder and MF gate in QCA nanotechnology

A Survey on Pipelined FFT Hardware Architectures

Design of an ultra-high-speed coplanar QCA reversible ALU with a novel coplanar reversible full adder based on MTSG

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Radix-10 Restoring Square Root for 6-input LUTs Programmable Devices

Abstract

Access this article

Similar content being viewed by others

Design of a reversible ALU using a novel coplanar reversible full adder and MF gate in QCA nanotechnology

A Survey on Pipelined FFT Hardware Architectures

Design of an ultra-high-speed coplanar QCA reversible ALU with a novel coplanar reversible full adder based on MTSG

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation