☆ 4.5 Article

Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions

IEEE TRANSACTIONS ON COMPUTERS (2016)

Journal

IEEE TRANSACTIONS ON COMPUTERS

Volume 65, Issue 4, Pages 1197-1210

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TC.2015.2441714

Keywords

Floating-point arithmetic; floating-point expansions; high precision arithmetic; multiple-precision arithmetic; division; reciprocal; square root; Newton-Raphson iteration

Funding

Region Rhone-Alpes
ANR FastRelax Project

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Many numerical problems require a higher computing precision than the one offered by standard floating-point (FP) formats. One common way of extending the precision is to represent numbers in a multiple component format. By using the so-called floating-point expansions, real numbers are represented as the unevaluated sum of standard machine precision FP numbers. This representation offers the simplicity of using directly available, hardware implemented and highly optimized, FP operations. It is used by multiple-precision libraries such as Bailey's QD or the analogue Graphics Processing Units (GPU) tuned version, GQD. In this article we briefly revisit algorithms for adding and multiplying FP expansions, then we introduce and prove new algorithms for normalizing, dividing and square rooting of FP expansions. The new method used for computing the reciprocal a(-1) and the square root root a of a FP expansion a is based on an adapted Newton-Raphson iteration where the intermediate calculations are done using truncated operations (additions, multiplications) involving FP expansions. We give here a thorough error analysis showing that it allows very accurate computations. More precisely, after q iterations, the computed FP expansion x = x(0) + ... + x(2q-1) satisfies, for the reciprocal algorithm, the relative error bound: vertical bar(x - a(-1))/a(-1)vertical bar <= 2(-2q(p-3)-1) and, respectively, for the square root one: vertical bar x - 1/root a vertical bar <= 2(-2q(p-3)-1)/root a, where p > 2 is the precision of the FP representation used (p = 24 for single precision and p = 53 for double precision).

Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions

Journal

IEEE TRANSACTIONS ON COMPUTERS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions

Journal

IEEE TRANSACTIONS ON COMPUTERS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper