☆ 4.7 Article

A Cheap Feature Selection Approach for the K-Means Algorithm

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Volume 32, Issue 5, Pages 2195-2208

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2020.3002576

Keywords

Dimensionality reduction; K-means clustering; feature selection; parallelization; unsupervised learning

Funding

Basque Government through the BERC 2014-2017 Program
ELKARTEK Program
Spanish Ministry of Economy and Competitiveness MINECO: BCAM Severo Ochoa excellence Accreditation [SVP-2014-068574, SEV-2013-0323]
Agencia Estatal de Investigacion (AEI)/Fondo Europeo de Desarrollo Regional (FEDER) [TIN2017-02626-R]
Basque Government under the BERC 2018-2021 program [IT-1244-19]
Spanish Ministry of Science, Innovation and Universities: BCAM Severo Ochoa accreditation [SEV2017-0718, TIN2016-78365-R, PID2019-104966GB-100]
Union Europea (UE)
grant Artificial Intelligence in BCAM [EXP. 2019/00432]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this article, a feature selection technique for the K-means algorithm is proposed, which is based on a novel feature relevance measure and aims to reduce K-means error and computational costs. Experimental results show that the proposed method consistently outperforms other feature selection techniques and feature extraction techniques in terms of clustering performance and computational time.

The increase in the number of features that need to be analyzed in a wide variety of areas, such as genome sequencing, computer vision, or sensor networks, represents a challenge for the K-means algorithm. In this regard, different dimensionality reduction approaches for the K-means algorithm have been designed recently, leading to algorithms that have proved to generate competitive clusterings. Unfortunately, most of these techniques tend to have fairly high computational costs and/or might not be easy to parallelize. In this article, we propose a fully parallelizable feature selection technique intended for the K-means algorithm. The proposal is based on a novel feature relevance measure that is closely related to the K-means error of a given clustering. Given a disjoint partition of the features, the technique consists of obtaining a clustering for each subset of features and selecting the m features with the highest relevance measure. The computational cost of this approach is just O(m . max{n . K, log m}) per subset of features. We additionally provide a theoretical analysis on the quality of the obtained solution via our proposal and empirically analyze its performance with respect to well-known feature selection and feature extraction techniques. Such an analysis shows that our proposal consistently obtains the results with lower K-means error than all the considered feature selection techniques: Laplacian scores, maximum variance, multicluster feature selection, and random selection while also requiring similar or lower computational times than these approaches. Moreover, when compared with feature extraction techniques, such as random projections, the proposed approach also shows a noticeable improvement in both error and computational time.

A Cheap Feature Selection Approach for the K-Means Algorithm

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Cheap Feature Selection Approach for the K-Means Algorithm

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper