4.7 Article

How to Use K-means for Big Data Clustering?

Related references

Note: Only part of the references are listed.
Article Operations Research & Management Science

Less is more approach in optimization: a road to artificial intelligence

Nenad Mladenovic et al.

Summary: The LIMA approach emphasizes using minimal ingredients for optimal results, successfully applied in various scientific and artistic disciplines, and now gaining traction in solving optimization problems; by defining dominance relations between algorithms, proposing the general LIMA algorithm, discussing automatic inclusion of common ingredients, and gradually increasing complexity, it may pave the way towards transitioning from optimization to artificial intelligence and machine learning.

OPTIMIZATION LETTERS (2022)

Article Computer Science, Artificial Intelligence

Efficient k -nearest neighbor search based on clustering and adaptive k values

Antonio Javier Gallego et al.

Summary: The paper introduces the caKD+ algorithm which combines various techniques to improve the efficiency of kNN search, outperforming 16 state-of-the-art methods on 10 datasets.

PATTERN RECOGNITION (2022)

Article Management

Less is more: simple algorithms for the minimum sum of squares clustering problem

Pawel Kalczynski et al.

Summary: The study proposes three algorithms for the clustering problem, demonstrating that good starting solutions combined with simple local search can achieve results comparable to, or even better than, more sophisticated algorithms used in the literature.

IMA JOURNAL OF MANAGEMENT MATHEMATICS (2022)

Article Computer Science, Interdisciplinary Applications

SOS-SDP: An Exact Solver for Minimum Sum-of-Squares Clustering

Veronica Piccialli et al.

Summary: This paper proposes an exact algorithm based on the branch-and-bound technique for the minimum sum-of-squares clustering problem. The algorithm computes the lower bound using a cutting-plane procedure and the upper bound using a constrained version of k-means. Instance-level constraints are incorporated in the branch-and-bound procedure to express the relationships between data points.

INFORMS JOURNAL ON COMPUTING (2022)

Article Computer Science, Interdisciplinary Applications

Parallel batch k-means for Big data clustering

Rasim M. Alguliyev et al.

Summary: This article introduces a new parallel batch clustering algorithm based on the k-means algorithm, which reduces computation complexity by splitting the dataset into multiple partitions and proposes a method to determine the optimal batch size. Experimental results show the practical applicability of this method for handling Big Data.

COMPUTERS & INDUSTRIAL ENGINEERING (2021)

Article Multidisciplinary Sciences

A New Sentence-Based Interpretative Topic Modeling and Automatic Topic Labeling

Olzhas Kozbagarov et al.

Summary: This article introduces a new conceptual approach for interpretative topic modeling, using sentences as the basic unit of analysis and employing sentence probability evaluations and clustering of sentence embeddings for topic modeling, allowing for explicit interpretation of topics.

SYMMETRY-BASEL (2021)

Article Computer Science, Artificial Intelligence

Memetic differential evolution methods for clustering problems

Pierluigi Mansueto et al.

Summary: The Euclidean Minimum Sum-of-Squares Clustering (MSSC) is a key model for clustering and has attracted much attention due to its NP-hardness. Recent research has focused on improving the classical K-MEANS algorithm by selecting starting configurations or using it as a local search method in a global optimization algorithm. This paper proposes a new implementation of the Memetic Differential Evolution (MDE) algorithm specifically designed for the MSSC problem, showing good quality and efficiency in comparison to existing methods.

PATTERN RECOGNITION (2021)

Review Computer Science, Information Systems

Scalable Clustering Algorithms for Big Data: A Review

Mahmoud A. Mahdi et al.

Summary: In the era of big data, traditional clustering algorithms face high computational costs, making it challenging to accurately process massive amounts of data in crucial moments. Despite the development of different algorithms to facilitate clustering processes, there are still many difficulties when dealing with large data volumes.

IEEE ACCESS (2021)

Article Computer Science, Artificial Intelligence

An efficient K-means clustering algorithm for tall data

Marco Capo et al.

DATA MINING AND KNOWLEDGE DISCOVERY (2020)

Article Computer Science, Artificial Intelligence

Textual data summarization using the Self-Organized Co-Clustering model

Margot Selosse et al.

PATTERN RECOGNITION (2020)

Article Computer Science, Artificial Intelligence

A simulated annealing-based maximum-margin clustering algorithm

Sattar Seifollahi et al.

COMPUTATIONAL INTELLIGENCE (2019)

Article Computer Science, Artificial Intelligence

HG-MEANS: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering

Daniel Gribel et al.

PATTERN RECOGNITION (2019)

Article Computer Science, Artificial Intelligence

To cluster, or not to cluster: An analysis of clusterability methods

Andreas Adolfsson et al.

PATTERN RECOGNITION (2019)

Article Computer Science, Artificial Intelligence

How much can k-means be improved by using better initialization and repeats?

Pasi Franti et al.

PATTERN RECOGNITION (2019)

Article Computer Science, Artificial Intelligence

I-k-means- plus : An iterative clustering algorithm based on an enhanced version of the k-means

Hassan Ismkhan

PATTERN RECOGNITION (2018)

Article Computer Science, Artificial Intelligence

Clustering in large data sets with the limited memory bundle method

Napsu Karmitsa et al.

PATTERN RECOGNITION (2018)

Article Operations Research & Management Science

J-means and I-means for minimum sum-of-squares clustering on networks

Alexey Nikolaev et al.

OPTIMIZATION LETTERS (2017)

Article Mathematics, Interdisciplinary Applications

On Strategies to Fix Degenerate k-means Solutions

Daniel Aloise et al.

JOURNAL OF CLASSIFICATION (2017)

Article Computer Science, Artificial Intelligence

Fast density clustering strategies based on the k-means algorithm

Liang Bai et al.

PATTERN RECOGNITION (2017)

Article Computer Science, Artificial Intelligence

Multimodal Deep Autoencoder for Human Pose Recovery

Chaoqun Hong et al.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2015)

Article Computer Science, Information Systems

Scalable K-Means++

Bahman Bahmani et al.

PROCEEDINGS OF THE VLDB ENDOWMENT (2012)

Article Computer Science, Theory & Methods

k-means Requires Exponentially Many Iterations Even in the Plane

Andrea Vattani

DISCRETE & COMPUTATIONAL GEOMETRY (2011)

Article Computer Science, Artificial Intelligence

Data clustering: 50 years beyond K-means

Anil K. Jain

PATTERN RECOGNITION LETTERS (2010)

Article Computer Science, Artificial Intelligence

NP-hardness of Euclidean sum-of-squares clustering

Daniel Aloise et al.

MACHINE LEARNING (2009)

Article Computer Science, Artificial Intelligence

A survey of kernel and spectral methods for clustering

Maurizio Filippone et al.

PATTERN RECOGNITION (2008)

Article Computer Science, Artificial Intelligence

J-MEANS: a new local search heuristic for minimum sum of squares clustering

P Hansen et al.

PATTERN RECOGNITION (2001)