4.6 Article

Analysis of complexity indices for classification problems: Cancer gene expression data

Journal

NEUROCOMPUTING
Volume 75, Issue 1, Pages 33-42

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2011.03.054

Keywords

Classification; Gene expression data; Complexity indices; Linear separability

Funding

  1. Brazilian research agency CNPq
  2. Brazilian research agency CAPES
  3. Brazilian research agency FACEPE
  4. Brazilian research agency UFABC

Ask authors/readers for more resources

Currently, cancer diagnosis at a molecular level has been made possible through the analysis of gene expression data. More specifically, one usually uses machine learning (ML) techniques to build, from cancer gene expression data, automatic diagnosis models (classifiers). Cancer gene expression data often present some characteristics that can have a negative impact in the generalization ability of the classifiers generated. Some of these properties are data sparsity and an unbalanced class distribution. We investigate the results of a set of indices able to extract the intrinsic complexity information from the data. Such measures can be used to analyze, among other things, which particular characteristics of cancer gene expression data mostly impact the prediction ability of support vector machine classifiers. In this context, we also show that, by applying a proper feature selection procedure to the data, one can reduce the influence of those characteristics in the error rates of the classifiers induced. (C) 2011 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available