☆ 4.5 Article

New distributed algorithms for fast sign detection in residue number systems (RNS)

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING (2016)

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

卷 97, 期 -, 页码 78-95

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jpdc.2016.06.005

关键词

Residue number systems; RNS; Reconstruction coefficient; Partial reconstruction; Reduced precision; Fast sign detection

类别

Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We identify a canonical parameter in the Chinese Remainder Theorem (CRT) and call it the Reconstruction Coefficient, (denoted by R-c); and introduce the notions of Partial and Full Reconstruction. If the R-c can be determined efficiently, then arithmetic operations that are (relatively) harder to realize in RNS; including Sign Detection, Base change/extension and Scaling or division by a constant can also be implemented efficiently. This paper therefore focuses on and presents two distinct methods to efficiently evaluate the R-c at long wordlengths. A straightforward application of these methods leads to ultra-fast sign-detection. An independent contribution of this paper is to illustrate non-trivial trade-offs between run-time computation vs. pre-computation and look-up. We show a simple method to select the moduli which leads to both the (i) number of RNS channels n; as well as (ii) the largest channel modulus m(n), satisfying {0(n), O(m(n))} (sic) N equivalent to the full-precision bit-length. The net result is that for many canonical operations; exhaustive look-up covering all possible input values is feasible even at long cryptographic bit-lengths N. Under fairly general and realistic assumptions about the capabilities of current hardware, the memory needed for exhaustive look-up tables is shown to be bounded by a low degree polynomial of n. Moreover, both methods to compute R-c can achieve a delay of O(lg n) in a RN system with n channels. To the best of our knowledge, no other method published to date has shown a path to achieve that lower bound on the execution delay. Further, small values of channel moduli make it ideal to implement each individual RNS channel on a simple core in a many-core processor or as a distributed node, and our algorithms require a limited number of inter-channel communications, averaging O(n). Results from a multi-core GPU implementation corroborate the theory. (C) 2016 Elsevier Inc. All rights reserved.

New distributed algorithms for fast sign detection in residue number systems (RNS)

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

New distributed algorithms for fast sign detection in residue number systems (RNS)

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文