☆ 4.5 Article

Parallel Reproducible Summation

IEEE TRANSACTIONS ON COMPUTERS (2015)

Journal

IEEE TRANSACTIONS ON COMPUTERS

Volume 64, Issue 7, Pages 2060-2070

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TC.2014.2345391

Keywords

Reproducibility; summation; floating-point; rounding error; parallel computing; numerical analysis

Funding

National Science Foundation (NSF) [NSF OCI-1032639, NSF ACI-1339676]
US Department of Energy (DOE) [DOE DE-SC0010200, DOE DE-SC0003959, DOE DE-SC0005136, DOE DE-SC0008699, DOE DE-SC0008700, DOE DE-SC0004938, DOE AC02-05CH11231]
DARPA [HR0011-12-2-0016]
Intel
Google
Nokia
NVIDIA
Oracle
Direct For Computer & Info Scie & Enginr [1339676] Funding Source: National Science Foundation
Office of Advanced Cyberinfrastructure (OAC) [1339676] Funding Source: National Science Foundation
U.S. Department of Energy (DOE) [DE-SC0008699] Funding Source: U.S. Department of Energy (DOE)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Reproducibility, i.e. getting bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [10]. However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, makes attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump's algorithm for error-free vector transformation [7], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm reproducibly computes highly accurate results with an absolute error bound of n . 2(-28) . macheps . max(i) vertical bar v(i)vertical bar at a cost of 7n FLOPs and a small constant amount of extra memory usage. Higher accuracies are also possible by increasing the number of error-free transformations. As long as all operations are performed in to-nearest rounding mode, results computed by the proposed algorithms are reproducible for any run on any platform. In particular, our algorithm requires the minimum number of reductions, i.e. one reduction of an array of six double precision floating point numbers per sum, and hence is well suited for massively parallel environments.

Parallel Reproducible Summation

Journal

IEEE TRANSACTIONS ON COMPUTERS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Parallel Reproducible Summation

Journal

IEEE TRANSACTIONS ON COMPUTERS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper