4.7 Article

Denoising in Representation Space via Data-Dependent Regularization for Better Representation

Journal

MATHEMATICS
Volume 11, Issue 10, Pages -

Publisher

MDPI
DOI: 10.3390/math11102327

Keywords

deep neural network; representation space; fully connected layer; feature extractor

Categories

Ask authors/readers for more resources

In this paper, the representation learning problem is studied from an out-of-distribution (OoD) perspective to identify the factors affecting representation quality. The concept of out-of-feature subspace (OoFS) noise is introduced, and its reduction is proven to be beneficial for better representation. A novel data-dependent regularizer is proposed to reduce noise in representations and achieve better performance in multiple tasks.
Despite the success of deep learning models, it remains challenging for the over-parameterized model to learn good representation under small-sample-size settings. In this paper, motivated by previous work on out-of-distribution (OoD) generalization, we study the representation learning problem from an OoD perspective to identify the fundamental factors affecting representation quality. We formulate a notion of out-of-feature subspace (OoFS) noise for the first time, and we link the OoFS noise in the feature extractor to the OoD performance of the model by proving two theorems that demonstrate that reducing OoFS noise in the feature extractor is beneficial in achieving better representation. Moreover, we identify two causes of OoFS noise and prove that the OoFS noise induced by random initialization can be filtered out via L-2 regularization. Finally, we propose a novel data-dependent regularizer that acts on the weights of the fully connected layer to reduce noise in the representations, thus implicitly forcing the feature extractor to focus on informative features and to rely less on noise via back-propagation. Experiments on synthetic datasets show that our method can learn hard-to-learn features; can filter out noise effectively; and outperforms GD, AdaGrad, and KFAC. Furthermore, experiments on the benchmark datasets show that our method achieves the best performance for three tasks among four.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available