Journal
MANAGEMENT SCIENCE
Volume -, Issue -, Pages -Publisher
INFORMS
DOI: 10.1287/mnsc.2021.4207
Keywords
statistics; design of experiments; decision analysis; inference; A/B testing; false discovery rate
Funding
- Wharton Dean's Research Fund
Ask authors/readers for more resources
The study reveals that up to 70% of significant results in website A/B testing are actually null effects, leading to high false discovery rates. Decision makers should be aware that one in five interventions achieving significance at a 5% confidence level may be ineffective in practice.
We investigate what fraction of all significant results in website A/B testing is actually null effects (i.e., the false discovery rate (FDR)). Our data consist of 4,964 effects from 2,766 experiments conducted on a commercial A/B testing platform. Using three different methods, we find that the FDR ranges between 28% and 37% for tests conducted at 10% significance and between 18% and 25% for tests at 5% significance (two sided). These high FDRs stem mostly from the high fraction of true null effects, about 70%, rather than from low power. Using our estimates, we also assess the potential of various A/B test designs to reduce the FDR. The two main implications are that decision makers should expect one in five interventions achieving significance at 5% confidence to be ineffective when deployed in the field and that analysts should consider using two-stage designs with multiple variations rather than basic A/B tests.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available