4.6 Article

Low-time-complexity document clustering using memristive dot product engine

Journal

SCIENCE CHINA-INFORMATION SCIENCES
Volume 65, Issue 2, Pages -

Publisher

SCIENCE PRESS
DOI: 10.1007/s11432-021-3316-x

Keywords

linear-time clustering; cosine similarity; spherical K-means; memristor; in-memory computing

Funding

  1. National Key Research and Development Plan of MOST of China [2019YFB2205100]
  2. National Natural Science Foundation of China [61874164, 92064012, 61841404]
  3. Hubei Key Laboratory for Advanced Memories, Hubei Engineering Research Center on Microelectronics
  4. Chua Memristor Institute

Ask authors/readers for more resources

This article introduces a method to accelerate document clustering using memristive in-memory computing, which reduces the time complexity by performing similarity measurement in one step. It also proposes a normalization scheme to reduce normalization steps during clustering and discusses the impact of non-ideal factors in memristors on clustering tasks.
Document clustering has been commonly accepted in the field of data analysis. Nevertheless, the challenging issues for the clustering are the massive similarity measurement operations in the von Neumann architecture which result in huge time consumption. Memristive in-memory computing provides a brand-new path to solve this problem. In this article, utilizing the memristive dot product engine, we demonstrate a cosine similarity accelerated document clustering method for the first time. The memristor-based clustering method lowers the time complexity from O(N center dot d) of the conventional algorithm to O(N) by executing similarity measurement in one step. Focused on the unit-length vectors, an in-situ normalization scheme for the stored vectors in the crossbar array is proposed to provide an efficient hardware training scheme and reduce the normalization steps during the clustering. Utilizing the BBCSport dataset as a benchmark, we further discussed the impact of the non-ideal factors in the memristors, including the available quantized states, the inevitable programming noise, and the device failure. Simulation results indicate that the 6-bit quantized states and 5% programming noise are acceptable for the document clustering tasks. Besides, high resistance states of the failure cells are recommended for higher performance clustering results.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available