期刊
MANAGEMENT SCIENCE
卷 -, 期 -, 页码 -出版社
INFORMS
DOI: 10.1287/mnsc.2021.4207
关键词
statistics; design of experiments; decision analysis; inference; A/B testing; false discovery rate
资金
- Wharton Dean's Research Fund
The study reveals that up to 70% of significant results in website A/B testing are actually null effects, leading to high false discovery rates. Decision makers should be aware that one in five interventions achieving significance at a 5% confidence level may be ineffective in practice.
We investigate what fraction of all significant results in website A/B testing is actually null effects (i.e., the false discovery rate (FDR)). Our data consist of 4,964 effects from 2,766 experiments conducted on a commercial A/B testing platform. Using three different methods, we find that the FDR ranges between 28% and 37% for tests conducted at 10% significance and between 18% and 25% for tests at 5% significance (two sided). These high FDRs stem mostly from the high fraction of true null effects, about 70%, rather than from low power. Using our estimates, we also assess the potential of various A/B test designs to reduce the FDR. The two main implications are that decision makers should expect one in five interventions achieving significance at 5% confidence to be ineffective when deployed in the field and that analysts should consider using two-stage designs with multiple variations rather than basic A/B tests.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据