Abstract
Given the popularity of decimal arithmetic, hardware implementation of decimal operations has been a hot topic of research in recent decades. Besides the four basic operations, the square root can be implemented as an instruction directly in the hardware, which improves the performance of the decimal floating-point unit in the processors. Hardware implementation of decimal square rooters is usually done using either functional or digit-recurrence algorithms. Functional algorithms, entailing multiplication per iteration, seem inadequate to use for decimal square roots, given the high cost of decimal multipliers. On the other hand, digit-recurrence square root algorithms, particularly SRT (this method is named after its creators, Sweeney, Robertson, and Tocher) algorithms, are simple and well suited for decimal arithmetic. This paper, with the intention of reducing the latency of the decimal square root operation while maintaining a reasonable cost, proposes an SRT algorithm and the corresponding hardware architecture to compute the decimal square root. The proposed fixed-point square root design requires n+3 cycles to compute an n-digit root; the synthesis results show an area cost of about 31K NAND2 and a cycle time of 40 FO4. These results reveal the 14 % speed advantage of the proposed decimal square root architecture over the fastest previous work (which uses a functional algorithm) with about a quarter of the area.








Similar content being viewed by others
Notes
ulp means unit in least (significant) position.
The upper bound of X with q 0=0 is \(( 0.55 \ldots ) ^{2} = ( \frac{5}{9} ) ^{2} = \frac{25}{81} \approx0.3\).
Fan-out of 4, i.e., the latency of an inverter driving 4 similar inverters in the output.
References
M.F. Cowlishaw, Decimal floating-point: algorithm for computers, in Proceedings of the 16th IEEE Symposium on Computer Arithmetic, June 2003, pp. 104–111
L. Eisen et al., IBM POWER6 accelerators: VMX and DFU. IBM J. Res. Dev. 51(6), 663–684 (2007)
M.D. Ercegovac, R. McIlhenny, Design and FPGA implementation of radix-10 algorithm for square root with limited precision premitives, in Proceedings of the 43rd Asilomar Conference on Signals, Systems and Computers (2009), pp. 935–939
S. Gorgin, G. Jaberipur, Fully redundant decimal arithmetic, in Proceedings of the 19th IEEE Symposium on Computer Arithmetic (2009), pp. 145–152
S. Gorgin, G. Jaberipur, A family of signed digit adders, in Proceedings of the 20th IEEE Symposium on Computer Arithmetic (2011), pp. 112–120
L. Han, S. Ko, High speed parallel decimal multiplication with redundant internet encoding. IEEE Trans. Comput. 62(5), 956–968 (2013)
IEEE Standards Committee, 754-2008 IEEE Standard for Floating-Point Arithmetic, pp. 1–58, August 2008. doi:10.1109/IEEESTD.2008.4610935
G. Jaberipur, A. Kaivani, Improving the speed of parallel decimal multiplication. IEEE Trans. Comput. 58(11), 1539–1552 (2009)
A. Kaivani, G. Jaberipur, Fully redundant decimal addition and subtraction using stored-unibit encoding. Integration 43(1), 34–41 (2010)
A. Kaivani, G. Jaberipur, Decimal CORDIC rotation based on selection by rounding. Comput. J. 54(11), 1798–1809 (2011)
T. Lang, A. Nannarelli, A radix-10 digit-recurrence division unit: algorithm and architecture. IEEE Trans. Comput. 56(6), 727–739 (2007)
R. Raafat et al., Decimal Floating-Point Square-Root Unit Using Newton–Raphson Iterations. US Patent Application Publication, US 2012/0011182 (2012)
SilMinds, DFP Newton–Raphson Square Root Units. IP Core Product Data Sheet, NRDecDiv64/128
STMicroelectronics, 90nm CMOS090 Design Platform, 2007
A. Vazquez, J. Villalba, E. Antelo, Computation of decimal transcendental functions using the CORDIC algorithm, in Proceedings of the 19th IEEE Symposium on Computer Arithmetic (2009), pp. 179–186
A. Vazquez, E. Antelo, P. Montuschi, Improved design of high-performance parallel decimal multipliers. IEEE Trans. Comput. 59(5), 679–693 (2010)
L.K. Wang, M.J. Schulte, Decimal floating-point square root using Newton–Raphson iteration, in Proceedings of the 16th International Conference on Application Specific Systems, Architecture and Processors (2005), pp. 309–315
Acknowledgements
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors would like to express their appreciation for the comments of the anonymous reviewers for improving this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kaivani, A., Ko, SB. Decimal SRT Square Root: Algorithm and Architecture. Circuits Syst Signal Process 32, 2137–2150 (2013). https://doi.org/10.1007/s00034-013-9586-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-013-9586-3