Fast and Accurate Floating Point Summation with Application to Computational Geometry

Demmel, James; Hida, Yozo

doi:10.1023/B:NUMA.0000049458.99541.38

Fast and Accurate Floating Point Summation with Application to Computational Geometry

Published: December 2004

Volume 37, pages 101–112, (2004)
Cite this article

Numerical Algorithms Aims and scope Submit manuscript

James Demmel¹ &
Yozo Hida²

201 Accesses
18 Citations
Explore all metrics

Abstract

We present several simple algorithms for accurately computing the sum of n floating point numbers using a wider accumulator. Let f and F be the number of significant bits in the summands and the accumulator, respectively. Then assuming gradual underflow, no overflow, and round-to-nearest arithmetic, up to ⌊2^F−f/(1−2^−f)⌋+1 numbers can be accurately added by just summing the terms in decreasing order of exponents, yielding a sum correct to within about 1.5 units in the last place. In particular, if the sum is zero, it is computed exactly. We apply this result to the floating point formats in the IEEE floating point standard, and investigate its performance. Our results show that in the absence of massive cancellation (the most common case) the cost of guaranteed accuracy is about 30–40% more than the straightforward summation. If massive cancellation does occur, the cost of computing the accurate sum is about a factor of ten. Finally, we apply our algorithm in computing a robust geometric predicate (used in computational geometry), where our accurate summation algorithm improves the existing algorithm by a factor of two on a nearly coplanar set of points.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recursive Convolutions of Unit Rectangle Function and Some Applications

Article 16 August 2022

A New Iterative Method to Find Polar Decomposition

Article 12 April 2024

Matrix-Vector Formulas of the Barycentric Lagrange Interpolation for Solving Systems of Two Linear Fredholm Integral Equations of the Second Kind

Article 17 April 2024

References

ANSI/IEEE, IEEE standard for binary floating point arithmetic, New York, Std 754–1985 edition (1985).
G. Bohlender, Floating point computation of functions with maximum accuracy, IEEE Trans. Comput. 26 (1977) 621–632.
Google Scholar
T. Dekker, A floating point technique for extending the available precision, Numer. Math. 18 (1971) 224–242.
Google Scholar
J. Demmel and Y. Hida, Accurate floating point summation, Computer Science Division Technical Report UCB//CSD–02–1180, University of California, Berkeley, submitted to SIAM J. Sci. Comput.
N.J. Higham, The accuracy of floating point summation, SIAMJ. Sci. Comput. 14(4) (1993) 783–799.
Google Scholar
N.J. Higham, Accuracy and Stability of Numerical Algorithms (SIAM, Philadelphia, PA, 1996).
Google Scholar
Intel Corporation, Intel Itanium architecture software developer's manual, Vol. 1. Intel Corporation (2002); http://developer.intel.com/design/itanium/manuals.
Intel Corporation, IA-32 Intel architecture software developer's manual, Vol. 1, Intel Corporation (2002); http://developer.intel.com/design/pentium/manuals.
W. Kahan, Doubled-precision IEEE standard 754 floating point arithmetic, manuscript (1987).
D. Knuth, The Art of Computer Programming, Vol. 2 (Addison-Wesley, Reading, MA, 1969).
Google Scholar
U. Kulisch and G. Bohlender, Formalization and implementation of floating-point matrix operations, Computing 16 (1976) 239–261.
Google Scholar
U. Kulisch and W.L. Miranker, Computer Arithmetic in Theory and Practice (Academic Press, New York, 1981).
[13] H. Leuprecht and W. Oberaigner, Parallel algorithms for the rounding exact summation of floating point numbers, Computing 28 (1982) 89–104.
Google Scholar
S. Linnainmaa, Software for doubled-precision floating point computations, ACM Trans. Math. Software 7 (1981) 272–283.
Google Scholar
M. Malcolm, On accurate floating-point summation, Comm. ACM 14(11) (1971) 731–736.
Google Scholar
O. Møller, Quasi double precision in floating-point arithmetic, BIT 5 (1965) 37–50.
M. Pichat, Correction d'une somme en arithmétique à virgule flottante, Numer. Math. 19 (1972) 400–406.
Google Scholar
D. Priest, Algorithms for arbitrary precision floating point arithmetic, in: Proc. of the 10th Symposium on Computer Arithmetic, eds. P. Kornerup and D. Matula, Grenoble, France, 26–28 June 1991 (IEEE Computer Soc. Press) pp. 132–145.
D. Priest, On properties of floating point arithmetics: Numerical stability and the cost of accurate computations, Ph.D. thesis, University of California at Berkeley (1992); available through anonymous FTP at ftp.icsi.berkeley.edu/pub/theory/priest-thesis.ps.Z.
D.R. Ross, Reducing truncation errors using cascading accumulators, Comm. ACM 8(1) (1965) 32–33.
Google Scholar
J.R. Shewchuk, Adaptive precision floating-point arithmetic and fast robust geometric predicates, Discrete Comput. Geometry 18(3) (1997) 305–363.
Google Scholar
J.M. Wolfe, Reducing truncation errors by programming, Comm. ACM 7(6) (1964) 355–356.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Division and Mathematics Department, University of California, Berkeley, CA, 94720, USA
James Demmel
Computer Science Division, University of California, Berkeley, CA, 94720, USA
Yozo Hida

Authors

James Demmel
View author publications
You can also search for this author in PubMed Google Scholar
Yozo Hida
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demmel, J., Hida, Y. Fast and Accurate Floating Point Summation with Application to Computational Geometry. Numerical Algorithms 37, 101–112 (2004). https://doi.org/10.1023/B:NUMA.0000049458.99541.38

Download citation

Issue Date: December 2004
DOI: https://doi.org/10.1023/B:NUMA.0000049458.99541.38

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast and Accurate Floating Point Summation with Application to Computational Geometry

Abstract

Access this article

Similar content being viewed by others

Recursive Convolutions of Unit Rectangle Function and Some Applications

A New Iterative Method to Find Polar Decomposition

Matrix-Vector Formulas of the Barycentric Lagrange Interpolation for Solving Systems of Two Linear Fredholm Integral Equations of the Second Kind

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Fast and Accurate Floating Point Summation with Application to Computational Geometry

Abstract

Access this article

Similar content being viewed by others

Recursive Convolutions of Unit Rectangle Function and Some Applications

A New Iterative Method to Find Polar Decomposition

Matrix-Vector Formulas of the Barycentric Lagrange Interpolation for Solving Systems of Two Linear Fredholm Integral Equations of the Second Kind

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation