Abstract
We present several simple algorithms for accurately computing the sum of n floating point numbers using a wider accumulator. Let f and F be the number of significant bits in the summands and the accumulator, respectively. Then assuming gradual underflow, no overflow, and round-to-nearest arithmetic, up to ⌊2F−f/(1−2−f)⌋+1 numbers can be accurately added by just summing the terms in decreasing order of exponents, yielding a sum correct to within about 1.5 units in the last place. In particular, if the sum is zero, it is computed exactly. We apply this result to the floating point formats in the IEEE floating point standard, and investigate its performance. Our results show that in the absence of massive cancellation (the most common case) the cost of guaranteed accuracy is about 30–40% more than the straightforward summation. If massive cancellation does occur, the cost of computing the accurate sum is about a factor of ten. Finally, we apply our algorithm in computing a robust geometric predicate (used in computational geometry), where our accurate summation algorithm improves the existing algorithm by a factor of two on a nearly coplanar set of points.
Similar content being viewed by others
References
ANSI/IEEE, IEEE standard for binary floating point arithmetic, New York, Std 754–1985 edition (1985).
G. Bohlender, Floating point computation of functions with maximum accuracy, IEEE Trans. Comput. 26 (1977) 621–632.
T. Dekker, A floating point technique for extending the available precision, Numer. Math. 18 (1971) 224–242.
J. Demmel and Y. Hida, Accurate floating point summation, Computer Science Division Technical Report UCB//CSD–02–1180, University of California, Berkeley, submitted to SIAM J. Sci. Comput.
N.J. Higham, The accuracy of floating point summation, SIAMJ. Sci. Comput. 14(4) (1993) 783–799.
N.J. Higham, Accuracy and Stability of Numerical Algorithms (SIAM, Philadelphia, PA, 1996).
Intel Corporation, Intel Itanium architecture software developer's manual, Vol. 1. Intel Corporation (2002); http://developer.intel.com/design/itanium/manuals.
Intel Corporation, IA-32 Intel architecture software developer's manual, Vol. 1, Intel Corporation (2002); http://developer.intel.com/design/pentium/manuals.
W. Kahan, Doubled-precision IEEE standard 754 floating point arithmetic, manuscript (1987).
D. Knuth, The Art of Computer Programming, Vol. 2 (Addison-Wesley, Reading, MA, 1969).
U. Kulisch and G. Bohlender, Formalization and implementation of floating-point matrix operations, Computing 16 (1976) 239–261.
U. Kulisch and W.L. Miranker, Computer Arithmetic in Theory and Practice (Academic Press, New York, 1981).
[13] H. Leuprecht and W. Oberaigner, Parallel algorithms for the rounding exact summation of floating point numbers, Computing 28 (1982) 89–104.
S. Linnainmaa, Software for doubled-precision floating point computations, ACM Trans. Math. Software 7 (1981) 272–283.
M. Malcolm, On accurate floating-point summation, Comm. ACM 14(11) (1971) 731–736.
O. Møller, Quasi double precision in floating-point arithmetic, BIT 5 (1965) 37–50.
M. Pichat, Correction d'une somme en arithmétique à virgule flottante, Numer. Math. 19 (1972) 400–406.
D. Priest, Algorithms for arbitrary precision floating point arithmetic, in: Proc. of the 10th Symposium on Computer Arithmetic, eds. P. Kornerup and D. Matula, Grenoble, France, 26–28 June 1991 (IEEE Computer Soc. Press) pp. 132–145.
D. Priest, On properties of floating point arithmetics: Numerical stability and the cost of accurate computations, Ph.D. thesis, University of California at Berkeley (1992); available through anonymous FTP at ftp.icsi.berkeley.edu/pub/theory/priest-thesis.ps.Z.
D.R. Ross, Reducing truncation errors using cascading accumulators, Comm. ACM 8(1) (1965) 32–33.
J.R. Shewchuk, Adaptive precision floating-point arithmetic and fast robust geometric predicates, Discrete Comput. Geometry 18(3) (1997) 305–363.
J.M. Wolfe, Reducing truncation errors by programming, Comm. ACM 7(6) (1964) 355–356.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Demmel, J., Hida, Y. Fast and Accurate Floating Point Summation with Application to Computational Geometry. Numerical Algorithms 37, 101–112 (2004). https://doi.org/10.1023/B:NUMA.0000049458.99541.38
Issue Date:
DOI: https://doi.org/10.1023/B:NUMA.0000049458.99541.38