4.4 Article

On the trade-off between number of examples and precision of supervision in machine learning problems

Journal

OPTIMIZATION LETTERS
Volume 15, Issue 5, Pages 1711-1733

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s11590-019-01486-x

Keywords

Optimal supervision time; Linear regression; Variance control; Ordinary least squares; Large-sample approximation

Ask authors/readers for more resources

This study examines the trade-off between the number of examples and their precision in linear regression problems where the conditional variance of the output can be controlled by varying the computational time dedicated to supervision. It demonstrates that in some cases, a smaller number of good examples may lead to lower generalization error than a larger number of bad examples. The results highlight the importance of collecting more reliable examples rather than simply increasing the size of the dataset.
We investigate linear regression problems for which one is given the additional possibility of controlling the conditional variance of the output given the input, by varying the computational time dedicated to supervise each example. For a given upper bound on the total computational time for supervision, we optimize the trade-off between the number of examples and their precision (the reciprocal of the conditional variance of the output), by formulating and solving suitable optimization problems, based on large-sample approximations of the outputs of the classical ordinary least squares and weighted least squares regression algorithms. Considering a specific functional form for that precision, we prove that there are cases in which many but bad examples provide a smaller generalization error than few but good ones, but also that the converse can occur, depending on the returns to scale of the precision with respect to the computational time assigned to supervise each example. Hence, the results of this study highlight that increasing the size of the dataset is not always beneficial, if one has the possibility to collect a smaller number of more reliable examples. We conclude presenting numerical results validating the theory, and discussing extensions of the proposed framework to other optimization problems.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available