Decimal SRT Square Root: Algorithm and Architecture

Kaivani, Amir; Ko, Seok-Bum

doi:10.1007/s00034-013-9586-3

Decimal SRT Square Root: Algorithm and Architecture

Published: 10 April 2013

Volume 32, pages 2137–2150, (2013)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Amir Kaivani¹ &
Seok-Bum Ko¹

355 Accesses
4 Citations
Explore all metrics

Abstract

Given the popularity of decimal arithmetic, hardware implementation of decimal operations has been a hot topic of research in recent decades. Besides the four basic operations, the square root can be implemented as an instruction directly in the hardware, which improves the performance of the decimal floating-point unit in the processors. Hardware implementation of decimal square rooters is usually done using either functional or digit-recurrence algorithms. Functional algorithms, entailing multiplication per iteration, seem inadequate to use for decimal square roots, given the high cost of decimal multipliers. On the other hand, digit-recurrence square root algorithms, particularly SRT (this method is named after its creators, Sweeney, Robertson, and Tocher) algorithms, are simple and well suited for decimal arithmetic. This paper, with the intention of reducing the latency of the decimal square root operation while maintaining a reasonable cost, proposes an SRT algorithm and the corresponding hardware architecture to compute the decimal square root. The proposed fixed-point square root design requires n+3 cycles to compute an n-digit root; the synthesis results show an area cost of about 31K NAND2 and a cycle time of 40 FO4. These results reveal the 14 % speed advantage of the proposed decimal square root architecture over the fastest previous work (which uses a functional algorithm) with about a quarter of the area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Survey on Pipelined FFT Hardware Architectures

Article Open access 06 July 2021

Mario Garrido

Open-source design of integrated circuits

Article Open access 09 January 2024

Patrick Fath, Manuel Moser, … Harald Pretl

Remodified Dual-CLCG Method and Its VLSI Architecture for Pseudorandom Bit Generation

Article 10 April 2024

Puna Kumar Rajak, Tarni Mandal & M. L. S. Sai Kumar

Notes

ulp means unit in least (significant) position.
The upper bound of X with q ₀=0 is \(( 0.55 \ldots ) ^{2} = ( \frac{5}{9} ) ^{2} = \frac{25}{81} \approx0.3\).
Fan-out of 4, i.e., the latency of an inverter driving 4 similar inverters in the output.

References

M.F. Cowlishaw, Decimal floating-point: algorithm for computers, in Proceedings of the 16th IEEE Symposium on Computer Arithmetic, June 2003, pp. 104–111
Google Scholar
L. Eisen et al., IBM POWER6 accelerators: VMX and DFU. IBM J. Res. Dev. 51(6), 663–684 (2007)
Article Google Scholar
M.D. Ercegovac, R. McIlhenny, Design and FPGA implementation of radix-10 algorithm for square root with limited precision premitives, in Proceedings of the 43rd Asilomar Conference on Signals, Systems and Computers (2009), pp. 935–939
Google Scholar
S. Gorgin, G. Jaberipur, Fully redundant decimal arithmetic, in Proceedings of the 19th IEEE Symposium on Computer Arithmetic (2009), pp. 145–152
Google Scholar
S. Gorgin, G. Jaberipur, A family of signed digit adders, in Proceedings of the 20th IEEE Symposium on Computer Arithmetic (2011), pp. 112–120
Google Scholar
L. Han, S. Ko, High speed parallel decimal multiplication with redundant internet encoding. IEEE Trans. Comput. 62(5), 956–968 (2013)
Article MathSciNet Google Scholar
IEEE Standards Committee, 754-2008 IEEE Standard for Floating-Point Arithmetic, pp. 1–58, August 2008. doi:10.1109/IEEESTD.2008.4610935
G. Jaberipur, A. Kaivani, Improving the speed of parallel decimal multiplication. IEEE Trans. Comput. 58(11), 1539–1552 (2009)
Article MathSciNet Google Scholar
A. Kaivani, G. Jaberipur, Fully redundant decimal addition and subtraction using stored-unibit encoding. Integration 43(1), 34–41 (2010)
Google Scholar
A. Kaivani, G. Jaberipur, Decimal CORDIC rotation based on selection by rounding. Comput. J. 54(11), 1798–1809 (2011)
Article Google Scholar
T. Lang, A. Nannarelli, A radix-10 digit-recurrence division unit: algorithm and architecture. IEEE Trans. Comput. 56(6), 727–739 (2007)
Article MathSciNet Google Scholar
R. Raafat et al., Decimal Floating-Point Square-Root Unit Using Newton–Raphson Iterations. US Patent Application Publication, US 2012/0011182 (2012)
SilMinds, DFP Newton–Raphson Square Root Units. IP Core Product Data Sheet, NRDecDiv64/128
STMicroelectronics, 90nm CMOS090 Design Platform, 2007
A. Vazquez, J. Villalba, E. Antelo, Computation of decimal transcendental functions using the CORDIC algorithm, in Proceedings of the 19th IEEE Symposium on Computer Arithmetic (2009), pp. 179–186
Google Scholar
A. Vazquez, E. Antelo, P. Montuschi, Improved design of high-performance parallel decimal multipliers. IEEE Trans. Comput. 59(5), 679–693 (2010)
Article MathSciNet Google Scholar
L.K. Wang, M.J. Schulte, Decimal floating-point square root using Newton–Raphson iteration, in Proceedings of the 16th International Conference on Application Specific Systems, Architecture and Processors (2005), pp. 309–315
Google Scholar

Download references

Acknowledgements

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors would like to express their appreciation for the comments of the anonymous reviewers for improving this paper.

Author information

Authors and Affiliations

Department of Electrical & Computer Engineering, University of Saskatchewan, Saskatoon, SK, Canada
Amir Kaivani & Seok-Bum Ko

Authors

Amir Kaivani
View author publications
You can also search for this author in PubMed Google Scholar
Seok-Bum Ko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seok-Bum Ko.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kaivani, A., Ko, SB. Decimal SRT Square Root: Algorithm and Architecture. Circuits Syst Signal Process 32, 2137–2150 (2013). https://doi.org/10.1007/s00034-013-9586-3

Download citation

Received: 15 November 2012
Revised: 21 March 2013
Published: 10 April 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s00034-013-9586-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Decimal SRT Square Root: Algorithm and Architecture

Abstract

Access this article

Similar content being viewed by others

A Survey on Pipelined FFT Hardware Architectures

Open-source design of integrated circuits

Remodified Dual-CLCG Method and Its VLSI Architecture for Pseudorandom Bit Generation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Decimal SRT Square Root: Algorithm and Architecture

Abstract

Access this article

Similar content being viewed by others

A Survey on Pipelined FFT Hardware Architectures

Open-source design of integrated circuits

Remodified Dual-CLCG Method and Its VLSI Architecture for Pseudorandom Bit Generation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation