3.8 Proceedings Paper

Exploiting Randomness in Sketching for Efficient Hardware Implementation of Machine Learning Applications

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/2966986.2967038

关键词

-

资金

  1. Div Of Electrical, Commun & Cyber Sys
  2. Directorate For Engineering [1056028] Funding Source: National Science Foundation

向作者/读者索取更多资源

Energy-efficient processing of large matrices for big-data applications using hardware acceleration is an intense area of research. Sketching of large matrices into their lower-dimensional representations is an effective strategy. For the first time, this paper develops a highly energy-efficient hardware implementation of a class of sketching methods based on random projections, known as Johnson Lindenstrauss (JL) transform. Crucially, we show how the randomness inherent in the projection matrix can be exploited to create highly efficient fixed-point arithmetic realizations of several machine-learning applications. Specifically, the transform's random matrices have two key properties that allow for significant energy gains in hardware implementations. The first is the smoothing property that allows us to drastically reduce operand bit-width in computation of the JL transform itself. The second is the randomizing property that allows bit-width reduction in subsequent machine-learning applications. Further, we identify a random matrix construction method that exploits the special sparsity structure to result in the most hardware-efficient realization and implement the highly optimized transform on an FPGA. Experimental results on (1) the k-nearest neighbor (KNN) classification and (2) the principal component analysis (PCA) show that with the same bit-width the proposed flow utilizing random projection achieves an up to 7X improvement in both latency and energy. Furthermore, by exploiting the smoothing and randomizing properties we are able to use a 1-bit instead of a 4-bit multiplier within KNN, which results in additional 50% and 6% improvement in area and energy respectively. The proposed I/O streaming strategy along with the hardware-efficient JL algorithm identified by us is able to achieve a 50% runtime reduction, a 17% area reduction in the stage of random projection compared to a standard design.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据