4.7 Article

Subspace learning for unsupervised feature selection via matrix factorization

Journal

PATTERN RECOGNITION
Volume 48, Issue 1, Pages 10-19

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2014.08.004

Keywords

Machine learning; Feature selection; Unsupervised learning; Matrix factorization; Subspace distance; Kernel method

Funding

  1. National Natural Science Foundation of China [61170128, 61379049, 71379089]
  2. Natural Science Foundation of Fujian Province, China [2012J01294]
  3. Science and Technology Key Project of Fujian Province, China [2012H0043]

Ask authors/readers for more resources

Dimensionality reduction is an important and challenging task in machine learning and data mining. Feature selection and feature extraction are two commonly used techniques for decreasing dimensionality of the data and increasing efficiency of learning algorithms. Specifically, feature selection realized in the absence of class labels, namely unsupervised feature selection, is challenging and interesting. In this paper, we propose a new unsupervised feature selection criterion developed from the viewpoint of subspace learning, which is treated as a matrix factorization problem. The advantages of this work are four-fold. First, dwelling on the technique of matrix factorization, a unified framework is established for feature selection, feature extraction and clustering. Second, an iterative update algorithm is provided via matrix factorization, which is an efficient technique to deal with high-dimensional data. Third, an effective method for feature selection with numeric data is put forward, instead of drawing support from the discretization process. Fourth, this new criterion provides a sound foundation for embedding kernel tricks into feature selection. With this regard, an algorithm based on kernel methods is also proposed. The algorithms are compared with four state-of-the-art feature selection methods using six publicly available datasets. Experimental results demonstrate that in terms of clustering results, the proposed two algorithms come with better performance than the others for almost all datasets we experimented with here. (C) 2014 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available