4.7 Article

False Discovery in A/B Testing

期刊

MANAGEMENT SCIENCE
卷 -, 期 -, 页码 -

出版社

INFORMS
DOI: 10.1287/mnsc.2021.4207

关键词

statistics; design of experiments; decision analysis; inference; A/B testing; false discovery rate

资金

  1. Wharton Dean's Research Fund

向作者/读者索取更多资源

The study reveals that up to 70% of significant results in website A/B testing are actually null effects, leading to high false discovery rates. Decision makers should be aware that one in five interventions achieving significance at a 5% confidence level may be ineffective in practice.
We investigate what fraction of all significant results in website A/B testing is actually null effects (i.e., the false discovery rate (FDR)). Our data consist of 4,964 effects from 2,766 experiments conducted on a commercial A/B testing platform. Using three different methods, we find that the FDR ranges between 28% and 37% for tests conducted at 10% significance and between 18% and 25% for tests at 5% significance (two sided). These high FDRs stem mostly from the high fraction of true null effects, about 70%, rather than from low power. Using our estimates, we also assess the potential of various A/B test designs to reduce the FDR. The two main implications are that decision makers should expect one in five interventions achieving significance at 5% confidence to be ineffective when deployed in the field and that analysts should consider using two-stage designs with multiple variations rather than basic A/B tests.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据