4.7 Article

Dendrograms, minimum spanning trees and feature selection

Journal

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH
Volume 308, Issue 2, Pages 555-567

Publisher

ELSEVIER
DOI: 10.1016/j.ejor.2022.11.031

Keywords

Combinatorial optimization; Feature selection; Hierarchical clustering; Single linkage; Minimum spanning tree

Ask authors/readers for more resources

This study introduces the problem of jointly determining a set of features and a dendrogram according to the single linkage method, proposing different formulations and studying different bounds on the objective function. The effectiveness of the different models is discussed through extensive computational study, comparing the model with valid inequalities to the decomposition algorithm. The computational results also demonstrate that integrating feature selection into the optimization model allows for a satisfactory percentage of information to be preserved.
Feature selection is a fundamental process to avoid overfitting and to reduce the size of databases with-out significant loss of information that applies to hierarchical clustering. Dendrograms are graphical rep-resentations of hierarchical clustering algorithms that for single linkage clustering can be interpreted as minimum spanning trees in the complete network defined by the database. In this work, we introduce the problem that determines jointly a set of features and a dendrogram, according to the single linkage method. We propose different formulations that include the minimum spanning tree problem constraints as well as the feature selection constraints. Different bounds on the objective function are studied. For one of the models, several families of valid inequalities are proposed and the problem of separating them is studied. For another formulation, a decomposition algorithm is designed. In an extensive computa-tional study, the effectiveness of the different models is discussed, the model with valid inequalities is compared with the decomposition algorithm. The computational results also illustrate that the integration of feature selection to the optimization model allows to keep a satisfactory percentage of information.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available