☆ 3.8 Proceedings Paper

How Much Data Is Sufficient to Learn High-Performing Algorithms? Generalization Guarantees for Data-Driven Algorithm Design

STOC '21: PROCEEDINGS OF THE 53RD ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (2021)

期刊

STOC '21: PROCEEDINGS OF THE 53RD ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING

卷 -, 期 -, 页码 919-932

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3406325.3451036

关键词

Automated algorithm design; data-driven algorithm design; automated algorithm configuration; machine learning theory; computational biology; mechanism design

类别

Computer Science, Theory & Methods Operations Research & Management Science Mathematics, Applied

资金

Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative [GBMF4554]
US National Institutes of Health [R01GM122935]
US National Science Foundation [IIS-1901403, IIS-1618714, CCF-1535967, CCF-1910321, SES-1919453, IIS-1718457, IIS-1617590, CCF-1733556, DBI-1937540]
US Army Research Office [W911NF-17-1-0082, W911NF2010081]
Defense Advanced Research Projects Agency [HR00112020003]
AWS Machine Learning Research Award
Amazon Research Award
Microsoft Research Faculty Fellowship
Bloomberg Research Grant
Carnegie Mellon University's Center for Machine Learning and Health
Machine Learning Research Award
U.S. Department of Defense (DOD) [W911NF2010081] Funding Source: U.S. Department of Defense (DOD)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Algorithms often have tunable parameters that impact performance metrics, but worst-case instances may be rare. Data-driven algorithm design can lead to performance improvements by returning parameter settings based on a training set. The challenge lies in the volatile nature of performance based on parameter changes.

Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case guarantee. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that data-driven algorithm design can lead to significant improvements in performance. This approach uses a training set of problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. We provide a broadly applicable theory for deriving generalization guarantees that bound the difference between the algorithm's average performance over the training set and its expected performance on the unknown distribution. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause a large change in behavior. Prior research (e.g., Gupta and Roughgarden, SICOMP'17; Balcan et al., COLT'17, ICML'18, EC'18) has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We uncover a unifying structure which we use to prove extremely general guarantees, yet we recover the bounds from prior research. Our guarantees, which are tight up to logarithmic factors in the worst case, apply whenever an algorithm's performance is a piecewise-constant, -linear, or-more generally-piecewise-structured function of its parameters. Our theory also implies novel bounds for voting mechanisms and dynamic programming algorithms from computational biology.

How Much Data Is Sufficient to Learn High-Performing Algorithms? Generalization Guarantees for Data-Driven Algorithm Design

期刊

STOC '21: PROCEEDINGS OF THE 53RD ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

How Much Data Is Sufficient to Learn High-Performing Algorithms? Generalization Guarantees for Data-Driven Algorithm Design

期刊

STOC '21: PROCEEDINGS OF THE 53RD ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文