3.8 Proceedings Paper

DLFloat: A 16-b Floating Point format designed for Deep Learning Training and Inference

出版社

IEEE COMPUTER SOC
DOI: 10.1109/ARITH.2019.00023

关键词

reduced precision computation; floating point; machine learning; deep learning

向作者/读者索取更多资源

The resilience of Deep Learning (DL) training and inference workloads to low-precision computations, coupled with the demand for power- and area-efficient hardware accelerators for these workloads, has led to the emergence of 16-bit floating point formats as the precision of choice for DL hardware accelerators. This paper describes our optimized 16-bit format that has 6 exponent bits and 9 fraction bits, derived from a study of the range of values encountered in DL applications. We demonstrate that our format preserves the accuracy of DL networks, and we compare its ease-of-use for DL against IEEE-754 half-precision (5 exponent bits and 10 fraction bits) and bfloat16 (8 exponent bits and 7 fraction bits). Further, our format eliminated sub-normals and simplifies rounding modes and handling of corner cases. This streamlines floating-point unit logic and enables realization of a compact power-efficient computation engine.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据