☆ 4.5 Article

A Full-Sample Clustering Model Considering Whole Process Optimization of Data

BIG DATA RESEARCH (2022)

Journal

BIG DATA RESEARCH

Volume 28, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.bdr.2021.100301

Keywords

Optimization of whole process; Principal component analysis; Self organizing maps; K-means cluster; Collaborative filtering

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

With the increasing data volume and dimensions, it is challenging to improve the accuracy and interpretability of clustering algorithm alone. To address this issue, we propose an improved feature selection and combined clustering model that considers overall process optimization. In this model, we preprocess the data, reduce feature dimensions through text weight + principal component analysis (PCA), and perform clustering analysis. We use an improved Self organizing maps (SOM) neural network and K-means clustering combination model, establish clustering algorithm evaluation indicators, and apply collaborative filtering to clusters with missing data. The proposed model demonstrates high clustering accuracy and interpretability through case analysis.

With the continuous increase of data volume and data dimensions, it becomes more and more difficult to improve the accuracy and interpretability of the algorithm only from the clustering algorithm itself. In order to improve the accuracy of the clustering algorithm and improve the interpretability of the clustering results, we propose an Improved feature selection and combined clustering model considering whole process optimization. In this model, we processed the data from the whole process of data mining and carried out clustering analysis. Firstly, we started data preprocessing, and then used the feature selection algorithm of text weight + principal component analysis (PCA) to reduce the feature dimension and obtain important features and data sets for clustering. Secondly, we used the improved Self organizing maps (SOM) neural network and K-means clustering combination model to perform clustering analysis and established clustering algorithm evaluation indicators. Thirdly, we used collaborative filtering to cluster data sets that included missing data to ensure that all sample data can obtain results. Finally, through case analysis, it was verified that the model proposed in this paper had high clustering accuracy and interpretability. (C) 2021 Published by Elsevier Inc.

A Full-Sample Clustering Model Considering Whole Process Optimization of Data

Journal

BIG DATA RESEARCH

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Full-Sample Clustering Model Considering Whole Process Optimization of Data

Journal

BIG DATA RESEARCH

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper