4.7 Article

Contrasting random and learned features in deep Bayesian linear regression

Journal

PHYSICAL REVIEW E
Volume 105, Issue 6, Pages -

Publisher

AMER PHYSICAL SOC
DOI: 10.1103/PhysRevE.105.064118

Keywords

-

Funding

  1. Google Faculty Research Award
  2. NSF [2134157]
  3. Direct For Mathematical & Physical Scien
  4. Division Of Mathematical Sciences [2134157] Funding Source: National Science Foundation

Ask authors/readers for more resources

Understanding the impact of feature learning on generalization is a key goal in deep learning theory. This study investigates the effect of representation learning ability on the generalization performance of deep Bayesian linear neural networks. The findings reveal the existence of double-descent behavior in different models, with random feature models achieving optimal generalization performance at specific widths.
Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch. We show that both models display samplewise double-descent behavior in the presence of label noise. Random feature models can also display modelwise double descent if there are narrow bottleneck layers, while deep networks do not show these divergences. Random feature models can have particular widths that are optimal for generalization at a given data density, while making neural networks as wide or as narrow as possible is always optimal. Moreover, we show that the leading-order correction to the kernel-limit learning curve cannot distinguish between random feature models and deep networks in which all layers are trained. Taken together, our findings begin to elucidate how architectural details affect generalization performance in this simple class of deep regression models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available