The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research

Editorial Material Obstetrics & Gynecology

Multiple comparisons: a tutorial. Part 1. Understanding hypothesis testing

Michael T. Lawson et al.

BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY (2021)

Add to Collection

Article Multidisciplinary Sciences

Meta-assessment of bias in science

Daniele Fanelli et al.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2017)

Add to Collection

Article Biology

Reproducibility in Cancer Biology: Making sense of replications

Brian A. Nosek et al.

ELIFE (2017)

Add to Collection

Article Mathematics, Interdisciplinary Applications

Is Most Published Research Really False?

Jeffrey T. Leek et al.

ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 4 (2017)

Add to Collection

Letter Anesthesiology

Most of the time, P is an unreliable marker, so we need no exact cut-off

G. B. Drummond

BRITISH JOURNAL OF ANAESTHESIA (2016)

Add to Collection

Article Ecology

Underappreciated problems of low replication in ecological field studies

Nathan P. Lemoine et al.

ECOLOGY (2016)

Add to Collection

Article Public, Environmental & Occupational Health

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Sander Greenland et al.

EUROPEAN JOURNAL OF EPIDEMIOLOGY (2016)

Add to Collection

Article Medicine, General & Internal

Evolution of Reporting P Values in the Biomedical Literature, 1990-2015

David Chavalarias et al.

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION (2016)

Add to Collection

Article Mathematics, Interdisciplinary Applications

Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses

M. J. Bayarri et al.

JOURNAL OF MATHEMATICAL PSYCHOLOGY (2016)

Add to Collection

Article Management

Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence

Blakeley B. McShane et al.

MANAGEMENT SCIENCE (2016)

Add to Collection

Editorial Material Multidisciplinary Sciences

IS THERE A REPRODUCIBILITY CRISIS?

Monya Baker

NATURE (2016)

Add to Collection

Letter Biochemical Research Methods

Confidence intervals are no salvation from the alleged fickleness of the P value

Jacques van Helden

NATURE METHODS (2016)

Add to Collection

Article Cell Biology

What does research reproducibility mean?

Steven N. Goodman et al.

SCIENCE TRANSLATIONAL MEDICINE (2016)

Add to Collection

Review Ecology

Transparency in Ecology and Evolution: Real Problems, Real Solutions

Timothy H. Parker et al.

TRENDS IN ECOLOGY & EVOLUTION (2016)

Add to Collection

Article Biochemistry & Molecular Biology

Current Incentives for Scientists Lead to Underpowered Studies with Erroneous Conclusions

Andrew D. Higginson et al.

PLOS BIOLOGY (2016)

Add to Collection

Article Multidisciplinary Sciences

Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value

Dorothy V. M. Bishop et al.

PEERJ (2016)

Add to Collection

Article Multidisciplinary Sciences

The natural selection of bad science

Paul E. Smaldino et al.

ROYAL SOCIETY OPEN SCIENCE (2016)

Add to Collection

Article Health Care Sciences & Services

Obtaining evidence by a single well-powered trial or several modestly powered trials

Joanna IntHout et al.

STATISTICAL METHODS IN MEDICAL RESEARCH (2016)

Add to Collection

Article Psychology, Multidisciplinary

Marginally Significant Effects as Evidence for Hypotheses: Changing Attitudes Over Four Decades

Laura Pritschet et al.

PSYCHOLOGICAL SCIENCE (2016)

Add to Collection

Article Sociology

Damaging Real Lives Through Obstinacy: Re-Emphasising Why Significance Testing is Wrong

Stephen Gorard

SOCIOLOGICAL RESEARCH ONLINE (2016)

Add to Collection

Article Psychology, Multidisciplinary

Misconceptions of the p-value among Chilean and Italian Academic Psychologists

Laura Badenes-Ribera et al.

FRONTIERS IN PSYCHOLOGY (2016)

Add to Collection

Article Psychology, Social

Conceptualizing and evaluating the replication of research results

Leandre R. Fabrigar et al.

JOURNAL OF EXPERIMENTAL SOCIAL PSYCHOLOGY (2016)

Add to Collection

Article Psychology, Multidisciplinary

What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science

Prasad Patil et al.

PERSPECTIVES ON PSYCHOLOGICAL SCIENCE (2016)

Add to Collection

News Item Multidisciplinary Sciences

How scientists fool themselves - and how they can stop

Regina Nuzzo

NATURE (2015)

Add to Collection

Article Biochemical Research Methods

The fickle P value generates irreproducible results

Lewis G. Halsey et al.

NATURE METHODS (2015)

Add to Collection

Article Plant Sciences

Does the P Value Have a Future in Plant Pathology?

L. V. Madden et al.

PHYTOPATHOLOGY (2015)

Add to Collection

Article Multidisciplinary Sciences

Estimating the reproducibility of psychological science

Alexander A. Aarts et al.

SCIENCE (2015)

Add to Collection

Article Multidisciplinary Sciences

A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too)

Joost C. F. de Winter et al.

PEERJ (2015)

Add to Collection

Article Computer Science, Interdisciplinary Applications

Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations

Jesper W. Schneider

SCIENTOMETRICS (2015)

Add to Collection

Article Psychology, Multidisciplinary

Is Psychology Suffering From a Replication Crisis? What Does Failure to Replicate Really Mean?

Scott E. Maxwell et al.

AMERICAN PSYCHOLOGIST (2015)

Add to Collection

Article Psychology, Multidisciplinary

Small Telescopes: Detectability and the Evaluation of Replication Results

Uri Simonsohn

PSYCHOLOGICAL SCIENCE (2015)

Add to Collection

Article Multidisciplinary Sciences

The Statistical Crisis in Science

Andrew Gelman et al.

AMERICAN SCIENTIST (2014)

Add to Collection

Editorial Material Ecology

Rejoinder

Paul A. Murtaugh

ECOLOGY (2014)

Add to Collection

Editorial Material Ecology

Comment on Murtaugh

Michael Lavine

ECOLOGY (2014)

Add to Collection

Article Ecology

To P or not to P?

Jarrett J. Barber et al.

ECOLOGY (2014)

Add to Collection

Article Ecology

In defense of P values

Paul A. Murtaugh

ECOLOGY (2014)

Add to Collection

Review Health Care Sciences & Services

Six Persistent Research Misconceptions

Kenneth J. Rothman

JOURNAL OF GENERAL INTERNAL MEDICINE (2014)

Add to Collection

Article Medicine, General & Internal

Increasing value and reducing waste in research design, conduct, and analysis

John P. A. Ioannidis et al.

LANCET (2014)

Add to Collection

Article Biochemistry & Molecular Biology

P-values in genomics: Apparent precision masks high uncertainty

L. C. Lazzeroni et al.

MOLECULAR PSYCHIATRY (2014)

Add to Collection

Article Multidisciplinary Sciences

Why Publishing Everything Is More Effective than Selective Publishing of Statistically Significant Results

Marcel A. L. M. van Assen et al.

PLOS ONE (2014)

Add to Collection

Letter Multidisciplinary Sciences

Adaptive revised standards for statistical evidence

Luis Pericchi et al.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2014)

Add to Collection

Letter Multidisciplinary Sciences

Revised evidence for statistical standards

Andrew Gelman et al.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2014)

Add to Collection

Letter Multidisciplinary Sciences

Reproducibility issues in science, is P value really the only answer?

Jean Gaudart et al.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2014)

Add to Collection

Letter Multidisciplinary Sciences

Reply to Gelman, Gaudart, Pericchi: More reasons to revise standards for statistical evidence

Valen E. Johnson

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2014)

Add to Collection

Editorial Material Psychology, Biological

On the persistence of low power in psychological science

Ivan Vankov et al.

QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY (2014)

Add to Collection

Editorial Material Medicine, General & Internal

How to Make More Published Research True

John P. A. Ioannidis

PLOS MEDICINE (2014)

Add to Collection

Article Multidisciplinary Sciences

An investigation of the false discovery rate and the misinterpretation of p-values

David Colquhoun

ROYAL SOCIETY OPEN SCIENCE (2014)

Add to Collection

Article Psychology, Multidisciplinary

Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors

Andrew Gelman et al.

PERSPECTIVES ON PSYCHOLOGICAL SCIENCE (2014)

Add to Collection

Article Psychology, Multidisciplinary

Using Bayes to get the most out of non-significant results

Zoltan Dienes

FRONTIERS IN PSYCHOLOGY (2014)

Add to Collection

Article Psychology, Multidisciplinary

Expectations for Replications Are Yours Realistic?

David J. Stanley et al.

PERSPECTIVES ON PSYCHOLOGICAL SCIENCE (2014)

Add to Collection

Article Psychology, Mathematical

When decision heuristics and science collide

Erica C. Yu et al.

PSYCHONOMIC BULLETIN & REVIEW (2014)

Add to Collection

Article Psychology, Multidisciplinary

Malignant side effects of null-hypothesis significance testing

Marc Branch

THEORY & PSYCHOLOGY (2014)

Add to Collection

Article Psychology, Multidisciplinary

The New Statistics: Why and How

Geoff Cumming

PSYCHOLOGICAL SCIENCE (2014)

Add to Collection

Editorial Material Multidisciplinary Sciences

Do We Really Need the S-word?

Megan D. Higgs

AMERICAN SCIENTIST (2013)

Add to Collection

Article Health Care Sciences & Services

How confidence intervals become confusion intervals

James McCormack et al.

BMC MEDICAL RESEARCH METHODOLOGY (2013)

Add to Collection

Editorial Material Public, Environmental & Occupational Health

Living with Statistics in Observational Research

Sander Greenland et al.

EPIDEMIOLOGY (2013)

Add to Collection

Editorial Material Public, Environmental & Occupational Health

Reconciling Theory and Practice What Is to Be Done with P Values?

David A. Savitz

EPIDEMIOLOGY (2013)

Add to Collection

Review Neurosciences

Deep impact: unintended consequences of journal rank

Bjoern Brembs et al.

FRONTIERS IN HUMAN NEUROSCIENCE (2013)

Add to Collection

Article Mathematics, Interdisciplinary Applications

Replication, statistical consistency, and publication bias

Gregory Francis

JOURNAL OF MATHEMATICAL PSYCHOLOGY (2013)

Add to Collection

Editorial Material Mathematics, Interdisciplinary Applications

Interrogating p-values

Andrew Gelman

JOURNAL OF MATHEMATICAL PSYCHOLOGY (2013)

Add to Collection

Letter Neurosciences

Confidence and precision increase with high statistical power

Katherine S. Button et al.

NATURE REVIEWS NEUROSCIENCE (2013)

Add to Collection

Review Neurosciences

Power failure: why small sample size undermines the reliability of neuroscience

Katherine S. Button et al.

NATURE REVIEWS NEUROSCIENCE (2013)

Add to Collection

Review Multidisciplinary Sciences

Systematic Review of the Empirical Evidence of Study Publication Bias and Outcome Reporting Bias - An Updated Review

Kerry Dwan et al.

PLOS ONE (2013)

Add to Collection

Article Multidisciplinary Sciences

Revised standards for statistical evidence

Valen E. Johnson

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2013)

Add to Collection

Article Psychology, Multidisciplinary

Why the Resistance to Statistical Innovations? Bridging the Communication Gap

Donald Sharpe

PSYCHOLOGICAL METHODS (2013)

Add to Collection

Article Public, Environmental & Occupational Health

Nonsignificance Plus High Power Does Not Imply Support for the Null Over the Alternative

Sander Greenland

ANNALS OF EPIDEMIOLOGY (2012)

Add to Collection

Article Psychology, Educational

Confidence Intervals Make a Difference: Effects of Showing Confidence Intervals on Inferential Reasoning

Rink Hoekstra et al.

EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT (2012)

Add to Collection

Article Computer Science, Interdisciplinary Applications

Negative results are disappearing from most disciplines and countries

Daniele Fanelli

SCIENTOMETRICS (2012)

Add to Collection

Article Psychology, Multidisciplinary

A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science's Aversion to the Null

Christopher J. Ferguson et al.

PERSPECTIVES ON PSYCHOLOGICAL SCIENCE (2012)

Add to Collection

Article Social Sciences, Mathematical Methods

Subjective p Intervals Researchers Underestimate the Variability of p Values Over Replication

Jerry Lai et al.

METHODOLOGY-EUROPEAN JOURNAL OF RESEARCH METHODS FOR THE BEHAVIORAL AND SOCIAL SCIENCES (2012)

Add to Collection

Article Psychology, Multidisciplinary

Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling

Leslie K. John et al.

PSYCHOLOGICAL SCIENCE (2012)

Add to Collection

Article Psychology, Applied

How Can Significance Tests Be Deinstitutionalized?

Marc Orlitzky

ORGANIZATIONAL RESEARCH METHODS (2012)

Add to Collection

Article Statistics & Probability

P-Value Precision and Reproducibility

Dennis D. Boos et al.

AMERICAN STATISTICIAN (2011)

Add to Collection

Review Behavioral Sciences

Issues in information theory-based statistical inference-a commentary from a frequentist's perspective

Roger Mundry

BEHAVIORAL ECOLOGY AND SOCIOBIOLOGY (2011)

Add to Collection

Article Behavioral Sciences

Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse

Wolfgang Forstmeier et al.

BEHAVIORAL ECOLOGY AND SOCIOBIOLOGY (2011)

Add to Collection

Article Public, Environmental & Occupational Health

Magnitude of effects in clinical trials published in high-impact general medical journals

Konstantinos C. M. Siontis et al.

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY (2011)

Add to Collection

Article Psychology, Multidisciplinary

Bayes Factor Approaches for Testing Interval Null Hypotheses

Richard D. Morey et al.

PSYCHOLOGICAL METHODS (2011)

Add to Collection

Article Psychology, Multidisciplinary

False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant

Joseph P. Simmons et al.

PSYCHOLOGICAL SCIENCE (2011)

Add to Collection

Article Statistics & Probability

Fisher, Neyman, and the Creation of Classical Statistics

Erich L. Lehmann

FISHER, NEYMAN, AND THE CREATION OF CLASSICAL STATISTICS (2011)

Add to Collection

Editorial Material Psychiatry

How reliable are scientific studies?

Marcus R. Munafo et al.

BRITISH JOURNAL OF PSYCHIATRY (2010)

Add to Collection

Article Health Care Sciences & Services

Dissemination and publication of research findings: an updated review of related biases

F. Song et al.

HEALTH TECHNOLOGY ASSESSMENT (2010)

Add to Collection

Review Mathematical & Computational Biology

Meta-research: The art of getting it wrong

John P. A. Ioannidisa

RESEARCH SYNTHESIS METHODS (2010)

Add to Collection

Article Psychology, Multidisciplinary

Confidence intervals permit, but do not guarantee, better inference than statistical significance testing

Melissa Coulson et al.

FRONTIERS IN PSYCHOLOGY (2010)

Add to Collection

Review Ecology

Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian

Stuart H. Hurlbert et al.

ANNALES ZOOLOGICI FENNICI (2009)

Add to Collection

Letter Biochemistry & Molecular Biology

Bias in genetic association studies and impact factor

M. R. Munafo et al.

MOLECULAR PSYCHIATRY (2009)

Add to Collection

Article Psychology

The Importance of Proving the Null

C. R. Gallistel

PSYCHOLOGICAL REVIEW (2009)

Add to Collection

Review Psychology, Mathematical

What is the probability of replicating a statistically significant effect?

Jeff Miller

PSYCHONOMIC BULLETIN & REVIEW (2009)

Add to Collection

Article Statistics & Probability

P-values are random variables

Duncan J. Murdoch et al.

AMERICAN STATISTICIAN (2008)

Add to Collection

Review Public, Environmental & Occupational Health

Why most discovered true associations are inflated

John P. A. Ioannidis

EPIDEMIOLOGY (2008)

Add to Collection

Editorial Material Medicine, General & Internal

Why Current Publication Practices May Distort Science

Neal S. Young et al.

PLOS MEDICINE (2008)

Add to Collection

Article Psychology, Multidisciplinary

Replication and p Intervals p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better

Geoff Cumming

PERSPECTIVES ON PSYCHOLOGICAL SCIENCE (2008)

Add to Collection

Article Communication

A communication researchers' guide to null hypothesis significance testing and alternatives

Timothy R. Levine et al.

HUMAN COMMUNICATION RESEARCH (2008)

Add to Collection

Article Social Sciences, Mathematical Methods

Publication bias in empirical sociological research - Do arbitrary significance levels distort published results?

Alan S. Gerber et al.

SOCIOLOGICAL METHODS & RESEARCH (2008)

Add to Collection

Article Education & Educational Research

Inference by Eye: Pictures of Confidence Intervals and Thinking About Levels of Confidence

Geoff Cumming

TEACHING STATISTICS (2007)

Add to Collection

Article Genetics & Heredity

Upward bias in odds ratio estimates from genome-wide association studies

Chad Garner

GENETIC EPIDEMIOLOGY (2007)

Add to Collection

Article Genetics & Heredity

Overcoming the winner's curse:: Estimating penetrance parameters from case-control data

Sebastian Zollner et al.

AMERICAN JOURNAL OF HUMAN GENETICS (2007)

Add to Collection

Article Psychology, Mathematical

Probability as certainty:: Dichotomous thinking and the misuse of p values

Rink Hoekstra et al.

PSYCHONOMIC BULLETIN & REVIEW (2006)

Add to Collection

Article Statistics & Probability

The difference between significant and not significant is not itself statistically significant

Andrew Gelman et al.

AMERICAN STATISTICIAN (2006)

Add to Collection

Article Biodiversity Conservation

Impact of criticism of null-hypothesis significance testing on statistical reporting practices in conservation biology

Fiona Fidler et al.

CONSERVATION BIOLOGY (2006)

Add to Collection

Article Ecology

Why do we still use stepwise modelling in ecology and behaviour?

Mark J. Whittingham et al.

JOURNAL OF ANIMAL ECOLOGY (2006)

Add to Collection

Article History & Philosophy Of Science

Models and statistical inference: The controversy between Fisher and Neyman-Pearson

J Lenhard

BRITISH JOURNAL FOR THE PHILOSOPHY OF SCIENCE (2006)

Add to Collection

Article Psychology, Clinical

Misuse of statistical tests in Archives of Clinical Neuropsychology publications

P Schatz et al.

ARCHIVES OF CLINICAL NEUROPSYCHOLOGY (2005)

Add to Collection

Review Medicine, General & Internal

Contradicted and initially stronger effects in highly cited clinical research

JPA Ioannidis

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION (2005)

Add to Collection

Article Medicine, General & Internal

Gastrointestinal tolerability and effectiveness of rofecoxib versus naproxen in the treatment of osteoarthritis - A randomized, controlled trial

JR Lisse et al.

ANNALS OF INTERNAL MEDICINE (2003)

Add to Collection

Article Psychology, Experimental

The p-value fallacy and how to avoid it

P Dixon

CANADIAN JOURNAL OF EXPERIMENTAL PSYCHOLOGY-REVUE CANADIENNE DE PSYCHOLOGIE EXPERIMENTALE (2003)

Add to Collection

Article Statistics & Probability

Confusion over measures of evidence (p's) versus errors (a's) in classical statistical testing

R Hubbard et al.

AMERICAN STATISTICIAN (2003)

Add to Collection

Article Behavioral Sciences

A survey of the statistical power of research in behavioral ecology and animal behavior

MD Jennions et al.

BEHAVIORAL ECOLOGY (2003)

Add to Collection

Article Psychology, Multidisciplinary

Even statisticians are not immune to misinterpretations of null hypothesis significance tests

MP Lecoutre et al.

INTERNATIONAL JOURNAL OF PSYCHOLOGY (2003)

Add to Collection

Letter Mathematical & Computational Biology

A comment on replication, p-values and evidence

S Senn

STATISTICS IN MEDICINE (2002)

Add to Collection

Article Biology

Relationships fade with time: a meta-analysis of temporal trends in publication in ecology and evolution

MD Jennions et al.

PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES (2002)

Add to Collection

Article Psychology, Multidisciplinary

Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests

WW Tryon

PSYCHOLOGICAL METHODS (2001)

Add to Collection

Article Psychology, Mathematical

Interpretation of significance levels by psychological researchers: The .05 cliff effect may be overstated

J Poitevineau et al.

PSYCHONOMIC BULLETIN & REVIEW (2001)

Add to Collection

Article Genetics & Heredity

Large upward bias in estimation of locus-specific effects from genomewide scans

HHH Göring et al.

AMERICAN JOURNAL OF HUMAN GENETICS (2001)

Add to Collection

Article Statistics & Probability

Calibration of p values for testing precise null hypotheses

T Sellke et al.

AMERICAN STATISTICIAN (2001)

Add to Collection

Article Medicine, General & Internal

Sifting the evidence - what's wrong with significance tests?

JAC Sterne et al.

BMJ-BRITISH MEDICAL JOURNAL (2001)

Add to Collection

Article Psychology, Multidisciplinary

Null hypothesis significance testing - On the survival of a flawed method

J Krueger

AMERICAN PSYCHOLOGIST (2001)

Add to Collection

Article Ecology

Null hypothesis testing: Problems, prevalence, and an alternative

DR Anderson et al.

JOURNAL OF WILDLIFE MANAGEMENT (2000)

Add to Collection

The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research

Related references

Export Citation

Share Paper