4.5 Article

Parallel Reproducible Summation

Journal

IEEE TRANSACTIONS ON COMPUTERS
Volume 64, Issue 7, Pages 2060-2070

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TC.2014.2345391

Keywords

Reproducibility; summation; floating-point; rounding error; parallel computing; numerical analysis

Funding

  1. National Science Foundation (NSF) [NSF OCI-1032639, NSF ACI-1339676]
  2. US Department of Energy (DOE) [DOE DE-SC0010200, DOE DE-SC0003959, DOE DE-SC0005136, DOE DE-SC0008699, DOE DE-SC0008700, DOE DE-SC0004938, DOE AC02-05CH11231]
  3. DARPA [HR0011-12-2-0016]
  4. Intel
  5. Google
  6. Nokia
  7. NVIDIA
  8. Oracle
  9. Direct For Computer & Info Scie & Enginr [1339676] Funding Source: National Science Foundation
  10. Office of Advanced Cyberinfrastructure (OAC) [1339676] Funding Source: National Science Foundation
  11. U.S. Department of Energy (DOE) [DE-SC0008699] Funding Source: U.S. Department of Energy (DOE)

Ask authors/readers for more resources

Reproducibility, i.e. getting bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [10]. However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, makes attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump's algorithm for error-free vector transformation [7], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm reproducibly computes highly accurate results with an absolute error bound of n . 2(-28) . macheps . max(i) vertical bar v(i)vertical bar at a cost of 7n FLOPs and a small constant amount of extra memory usage. Higher accuracies are also possible by increasing the number of error-free transformations. As long as all operations are performed in to-nearest rounding mode, results computed by the proposed algorithms are reproducible for any run on any platform. In particular, our algorithm requires the minimum number of reductions, i.e. one reduction of an array of six double precision floating point numbers per sum, and hence is well suited for massively parallel environments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available