☆ 4.6 Article

FALSE DISCOVERY RATE CONTROL WITH UNKNOWN NULL DISTRIBUTION: IS IT POSSIBLE TO MIMIC THE ORACLE?

ANNALS OF STATISTICS (2022)

Journal

ANNALS OF STATISTICS

Volume 50, Issue 2, Pages 1095-1123

Publisher

INST MATHEMATICAL STATISTICS-IMS

DOI: 10.1214/21-AOS2141

Keywords

Benjamini-Hochberg procedure; false discovery rate; minimax; multiple testing; phase transition; sparsity; null distribution

Funding

GDR ISIS through the projets exploratoires program (project TASTY)
[ANR-16-CE40-0019]
[ANR17-CE40-0001]
[ANR-21-CE23-0035]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper presents theoretical foundations to understand the limitations of classical multiple testing theory in large scale experiments and proposes a procedure to properly learn the null distribution. The study focuses on Gaussian null distributions with unknown rescaling parameters and derives a procedure that mimics the performance of the idealized oracle. The results establish a phase transition at the sparsity boundary and provide insights for general location models.

Classical multiple testing theory prescribes the null distribution, which is often too stringent an assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distribution, and how it can be properly learned from the same data set, when possible. We explore this issue in the setting where the null distributions are Gaussian with unknown rescaling parameters (mean and variance) whereas the alternative distributions are let arbitrary. In that case, an oracle procedure is the Benjamini-Hochberg procedure applied with the true (unknown) null distribution and we aim at building a procedure that asymptotically mimics the performances of the oracle (AMO in short). Our main result establishes a phase transition at the sparsity boundary n / log(n): an AMO procedure exists if and only if the number of false nulls is of order less than n / log(n), where n is the total number of tests. Further sparsity boundaries are derived for general location models where the shape of the null distribution is not necessarily Gaussian. In light of our impossibility results, we also pursue the less stringent aim of building a nonparametric confidence region for the null distribution. From a practical perspective, this provides goodness-of-fit tests for the null distribution and allows to assess the reliability of empirical null procedures via novel diagnostic graphs. Our results are illustrated on numerical experiments and real data sets, as detailed in a companion vignette (Roquain and Verzelen (2021)).

FALSE DISCOVERY RATE CONTROL WITH UNKNOWN NULL DISTRIBUTION: IS IT POSSIBLE TO MIMIC THE ORACLE?

Journal

ANNALS OF STATISTICS

Publisher

INST MATHEMATICAL STATISTICS-IMS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

FALSE DISCOVERY RATE CONTROL WITH UNKNOWN NULL DISTRIBUTION: IS IT POSSIBLE TO MIMIC THE ORACLE?

Journal

ANNALS OF STATISTICS

Publisher

INST MATHEMATICAL STATISTICS-IMS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper