3.8 Proceedings Paper

Exploiting Randomness in Sketching for Efficient Hardware Implementation of Machine Learning Applications

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/2966986.2967038

Keywords

-

Funding

  1. Div Of Electrical, Commun & Cyber Sys
  2. Directorate For Engineering [1056028] Funding Source: National Science Foundation

Ask authors/readers for more resources

Energy-efficient processing of large matrices for big-data applications using hardware acceleration is an intense area of research. Sketching of large matrices into their lower-dimensional representations is an effective strategy. For the first time, this paper develops a highly energy-efficient hardware implementation of a class of sketching methods based on random projections, known as Johnson Lindenstrauss (JL) transform. Crucially, we show how the randomness inherent in the projection matrix can be exploited to create highly efficient fixed-point arithmetic realizations of several machine-learning applications. Specifically, the transform's random matrices have two key properties that allow for significant energy gains in hardware implementations. The first is the smoothing property that allows us to drastically reduce operand bit-width in computation of the JL transform itself. The second is the randomizing property that allows bit-width reduction in subsequent machine-learning applications. Further, we identify a random matrix construction method that exploits the special sparsity structure to result in the most hardware-efficient realization and implement the highly optimized transform on an FPGA. Experimental results on (1) the k-nearest neighbor (KNN) classification and (2) the principal component analysis (PCA) show that with the same bit-width the proposed flow utilizing random projection achieves an up to 7X improvement in both latency and energy. Furthermore, by exploiting the smoothing and randomizing properties we are able to use a 1-bit instead of a 4-bit multiplier within KNN, which results in additional 50% and 6% improvement in area and energy respectively. The proposed I/O streaming strategy along with the hardware-efficient JL algorithm identified by us is able to achieve a 50% runtime reduction, a 17% area reduction in the stage of random projection compared to a standard design.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available