4.5 Article

Implications of the curse of dimensionality for supervised learning classifier systems: theoretical and empirical analyses

Journal

PATTERN ANALYSIS AND APPLICATIONS
Volume 22, Issue 2, Pages 519-536

Publisher

SPRINGER
DOI: 10.1007/s10044-017-0649-0

Keywords

Rule-based systems; Genetic-based learning; Learning classifier systems; Ensemble learning; Data classification; High-dimensional data

Ask authors/readers for more resources

Learning classifier systems are leading evolutionary machine learning systems that employ genetic algorithms to search for a set of optimally general and correct classification rules for a variety of machine learning problems, including supervised classification data mining problems. The curse of dimensionality is a phenomenon that arises when analysing data in high-dimensional spaces. Performance issues when dealing with increasing dimensionality in the training data, such as poor classification accuracy and stalled genetic search, are well known for learning classifier systems. However, a systematic study to establish the relationship between increasing dimensionality and learning challenges in these systems is lacking. The aim of this paper is to analyse the behaviour of Michigan-style learning classifier systems that use the most commonly adopted and expressive interval-based rules representation, under curse of dimensionality (also known as Hughes Phenomenon) problem. In this paper, we use well-established and mathematically founded formal geometrical properties of high-dimensional data spaces and generalisation theory of these systems to propose a formulation of such relationship. The proposed formulations are validated experimentally using a set of synthetic, two-class classification problems. The findings demonstrate that the curse of dimensionality occurs for as few as ten dimensions and negatively affects the evolutionary search with a hyper-rectangular rule representation. A number of approaches to overcome some of the difficulties uncovered by the presented analysis are then discussed. Three approaches are then analysed in more detail using a set of synthetic, two-class classification problems. Experimental study demonstrates the effectiveness of these approaches to handle increasing dimensional data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available