4.6 Article

DIFFERENCE-OF-CONVEX LEARNING: DIRECTIONAL STATIONARITY, OPTIMALITY, AND SPARSITY

Journal

SIAM JOURNAL ON OPTIMIZATION
Volume 27, Issue 3, Pages 1637-1665

Publisher

SIAM PUBLICATIONS
DOI: 10.1137/16M1084754

Keywords

statistical learning; difference-of-convex programs; directional stationary points; optimality; sparsity

Funding

  1. National Science Foundation [CMMI-1333902, IIS-1632971, DMS-1222507, DMS-1522383, IIS-1632935]
  2. Air Force Office of Scientific Research [FA9550-15-1-0126]
  3. Direct For Mathematical & Physical Scien [1522383] Funding Source: National Science Foundation

Ask authors/readers for more resources

This paper studies a fundamental bicriteria optimization problem for variable selection in statistical learning; the two criteria are a loss/residual function and a model control (also called regularization, penalty). The former function measures the fitness of the learning model to data and the latter function is employed as a control of the complexity of the model. We focus on the case where the loss function is (strongly) convex and the model control function is a difference of -convex (dc) sparsity measure. Our paper establishes some fundamental optimality and sparsity properties of directional stationary solutions to a nonconvex Lagrangian formulation of the bicriteria optimization problem, based on a specially structured dc representation of many well-known sparsity functions that can be profitably exploited in the analysis. We relate the Lagrangian optimization problem with the penalty constrained problem in terms of their respective d(irectional)-stationary solutions; this is in contrast to common analysis that pertains to the (global) minimizers of the problem which are not computable due to nonconvexity. Most importantly, we provide sufficient conditions under which the d(irectional)-stationary solutions of the nonconvex Lagrangian formulation are global minimizers (possibly restricted due to nondifferentiability), thereby filling the gap between previous minimizer-based analysis and practical computational considerations. The established relation allows us to readily apply the derived results for the Lagrangian formulation to the penalty constrained formulation. Specializations of the conditions to exact and surrogate sparsity functions are discussed, yielding optimality and sparsity results for existing nonconvex formulations of the statistical learning problem.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available