Skip to main content
Log in

Decimal SRT Square Root: Algorithm and Architecture

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Given the popularity of decimal arithmetic, hardware implementation of decimal operations has been a hot topic of research in recent decades. Besides the four basic operations, the square root can be implemented as an instruction directly in the hardware, which improves the performance of the decimal floating-point unit in the processors. Hardware implementation of decimal square rooters is usually done using either functional or digit-recurrence algorithms. Functional algorithms, entailing multiplication per iteration, seem inadequate to use for decimal square roots, given the high cost of decimal multipliers. On the other hand, digit-recurrence square root algorithms, particularly SRT (this method is named after its creators, Sweeney, Robertson, and Tocher) algorithms, are simple and well suited for decimal arithmetic. This paper, with the intention of reducing the latency of the decimal square root operation while maintaining a reasonable cost, proposes an SRT algorithm and the corresponding hardware architecture to compute the decimal square root. The proposed fixed-point square root design requires n+3 cycles to compute an n-digit root; the synthesis results show an area cost of about 31K NAND2 and a cycle time of 40 FO4. These results reveal the 14 % speed advantage of the proposed decimal square root architecture over the fastest previous work (which uses a functional algorithm) with about a quarter of the area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. ulp means unit in least (significant) position.

  2. The upper bound of X with q 0=0 is \(( 0.55 \ldots ) ^{2} = ( \frac{5}{9} ) ^{2} = \frac{25}{81} \approx0.3\).

  3. Fan-out of 4, i.e., the latency of an inverter driving 4 similar inverters in the output.

References

  1. M.F. Cowlishaw, Decimal floating-point: algorithm for computers, in Proceedings of the 16th IEEE Symposium on Computer Arithmetic, June 2003, pp. 104–111

    Google Scholar 

  2. L. Eisen et al., IBM POWER6 accelerators: VMX and DFU. IBM J. Res. Dev. 51(6), 663–684 (2007)

    Article  Google Scholar 

  3. M.D. Ercegovac, R. McIlhenny, Design and FPGA implementation of radix-10 algorithm for square root with limited precision premitives, in Proceedings of the 43rd Asilomar Conference on Signals, Systems and Computers (2009), pp. 935–939

    Google Scholar 

  4. S. Gorgin, G. Jaberipur, Fully redundant decimal arithmetic, in Proceedings of the 19th IEEE Symposium on Computer Arithmetic (2009), pp. 145–152

    Google Scholar 

  5. S. Gorgin, G. Jaberipur, A family of signed digit adders, in Proceedings of the 20th IEEE Symposium on Computer Arithmetic (2011), pp. 112–120

    Google Scholar 

  6. L. Han, S. Ko, High speed parallel decimal multiplication with redundant internet encoding. IEEE Trans. Comput. 62(5), 956–968 (2013)

    Article  MathSciNet  Google Scholar 

  7. IEEE Standards Committee, 754-2008 IEEE Standard for Floating-Point Arithmetic, pp. 1–58, August 2008. doi:10.1109/IEEESTD.2008.4610935

  8. G. Jaberipur, A. Kaivani, Improving the speed of parallel decimal multiplication. IEEE Trans. Comput. 58(11), 1539–1552 (2009)

    Article  MathSciNet  Google Scholar 

  9. A. Kaivani, G. Jaberipur, Fully redundant decimal addition and subtraction using stored-unibit encoding. Integration 43(1), 34–41 (2010)

    Google Scholar 

  10. A. Kaivani, G. Jaberipur, Decimal CORDIC rotation based on selection by rounding. Comput. J. 54(11), 1798–1809 (2011)

    Article  Google Scholar 

  11. T. Lang, A. Nannarelli, A radix-10 digit-recurrence division unit: algorithm and architecture. IEEE Trans. Comput. 56(6), 727–739 (2007)

    Article  MathSciNet  Google Scholar 

  12. R. Raafat et al., Decimal Floating-Point Square-Root Unit Using Newton–Raphson Iterations. US Patent Application Publication, US 2012/0011182 (2012)

  13. SilMinds, DFP Newton–Raphson Square Root Units. IP Core Product Data Sheet, NRDecDiv64/128

  14. STMicroelectronics, 90nm CMOS090 Design Platform, 2007

  15. A. Vazquez, J. Villalba, E. Antelo, Computation of decimal transcendental functions using the CORDIC algorithm, in Proceedings of the 19th IEEE Symposium on Computer Arithmetic (2009), pp. 179–186

    Google Scholar 

  16. A. Vazquez, E. Antelo, P. Montuschi, Improved design of high-performance parallel decimal multipliers. IEEE Trans. Comput. 59(5), 679–693 (2010)

    Article  MathSciNet  Google Scholar 

  17. L.K. Wang, M.J. Schulte, Decimal floating-point square root using Newton–Raphson iteration, in Proceedings of the 16th International Conference on Application Specific Systems, Architecture and Processors (2005), pp. 309–315

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors would like to express their appreciation for the comments of the anonymous reviewers for improving this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seok-Bum Ko.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kaivani, A., Ko, SB. Decimal SRT Square Root: Algorithm and Architecture. Circuits Syst Signal Process 32, 2137–2150 (2013). https://doi.org/10.1007/s00034-013-9586-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-013-9586-3

Keywords

Navigation