4.5 Article

A Full-Sample Clustering Model Considering Whole Process Optimization of Data

Journal

BIG DATA RESEARCH
Volume 28, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.bdr.2021.100301

Keywords

Optimization of whole process; Principal component analysis; Self organizing maps; K-means cluster; Collaborative filtering

Ask authors/readers for more resources

With the increasing data volume and dimensions, it is challenging to improve the accuracy and interpretability of clustering algorithm alone. To address this issue, we propose an improved feature selection and combined clustering model that considers overall process optimization. In this model, we preprocess the data, reduce feature dimensions through text weight + principal component analysis (PCA), and perform clustering analysis. We use an improved Self organizing maps (SOM) neural network and K-means clustering combination model, establish clustering algorithm evaluation indicators, and apply collaborative filtering to clusters with missing data. The proposed model demonstrates high clustering accuracy and interpretability through case analysis.
With the continuous increase of data volume and data dimensions, it becomes more and more difficult to improve the accuracy and interpretability of the algorithm only from the clustering algorithm itself. In order to improve the accuracy of the clustering algorithm and improve the interpretability of the clustering results, we propose an Improved feature selection and combined clustering model considering whole process optimization. In this model, we processed the data from the whole process of data mining and carried out clustering analysis. Firstly, we started data preprocessing, and then used the feature selection algorithm of text weight + principal component analysis (PCA) to reduce the feature dimension and obtain important features and data sets for clustering. Secondly, we used the improved Self organizing maps (SOM) neural network and K-means clustering combination model to perform clustering analysis and established clustering algorithm evaluation indicators. Thirdly, we used collaborative filtering to cluster data sets that included missing data to ensure that all sample data can obtain results. Finally, through case analysis, it was verified that the model proposed in this paper had high clustering accuracy and interpretability. (C) 2021 Published by Elsevier Inc.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available