4.7 Article

Deep Feature-Based Text Clustering and its Explanation

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Volume 34, Issue 8, Pages 3669-3680

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2020.3028943

Keywords

Task analysis; Computational modeling; Feature extraction; Clustering algorithms; Semantics; Data models; Recurrent neural networks; Deep learning; explanation model; feature extraction; text clustering; transfer learning

Funding

  1. National Natural Science Foundation of China [61602207, 61972174]
  2. Science Technology Development Project of Jilin Province [20190302107GX]
  3. Special Research and Development of Industrial Technology of Jilin Province [2019C053-7]
  4. Guangdong Key Project for Applied Fundamental Research [2018KZDXM076]
  5. Guangdong Premier Key-Discipline Enhancement Scheme [2016GDYSZDXK036]
  6. National Project PRIN

Ask authors/readers for more resources

The paper proposes a deep feature-based text clustering framework that incorporates pretrained text encoders into clustering tasks. The model outperforms classic algorithms and BERT on multiple datasets and provides explanations for the clustering results, aiding the understanding of deep learning approaches.
Text clustering is a critical step in text data analysis and has been extensively studied by the text mining community. Most existing text clustering algorithms are based on the bag-of-words model, which faces the high-dimensional and sparsity problems and ignores text structural and sequence information. Deep learning-based models such as convolutional neural networks and recurrent neural networks regard texts as sequences but lack supervised signals and explainable results. In this paper, we propose a deep feature-based text clustering (DFTC) framework that incorporates pretrained text encoders into text clustering tasks. This model, which is based on sequence representations, breaks the dependency on supervision. The experimental results show that our model outperforms classic text clustering algorithms and the state-of-the-art pretrained language model, i.e., BERT, on almost all the considered datasets. In addition, the explanation of the clustering results is significant for understanding the principles of the deep learning approach. Our proposed clustering framework includes an explanation module that can help users understand the meaning and quality of the clustering results.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available