☆ 4.4 Article

Comparing Forecast Skill

MONTHLY WEATHER REVIEW (2014)

Journal

MONTHLY WEATHER REVIEW

Volume 142, Issue 12, Pages 4658-4678

Publisher

AMER METEOROLOGICAL SOC

DOI: 10.1175/MWR-D-14-00045.1

Keywords

Funding

National Oceanic and Atmospheric Administration under Climate Test Bed program [NA10OAR4310264]
National Science Foundation [ATM0332910, ATM0830062, ATM0830068]
National Aeronautics and Space Administration [NNG04GG46G, NNX09AN50G]
National Oceanic and Atmospheric Administration [NA04OAR4310034, NA09OAR4310058, NA10OAR4310210, NA10OAR4310249, NA12OAR4310091]
Office of Naval Research Award [N00014-12-1-091]
NOAA
NSF
NASA
DOE

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

A basic question in forecasting is whether one prediction system is more skillful than another. Some commonly used statistical significance tests cannot answer this question correctly if the skills are computed on a common period or using a common set of observations, because these tests do not account for correlations between sample skill estimates. Furthermore, the results of these tests are biased toward indicating no difference in skill, a fact that has important consequences for forecast improvement. This paper shows that the magnitude of bias is characterized by a few parameters such as sample size and correlation between forecasts and their errors, which, surprisingly, can be estimated from data. The bias is substantial for typical seasonal forecasts, implying that familiar tests may wrongly judge that differences in seasonal forecast skill are insignificant. Four tests that are appropriate for assessing differences in skill over a common period are reviewed. These tests are based on the sign test, the Wilcoxon signed-rank test, the Morgan-Granger-Newbold test, and a permutation test. These techniques are applied to ENSO hindcasts from the North American Multimodel Ensemble and reveal that the Climate Forecast System, version 2, and the Canadian Climate Model, version 3 (CanCM3), outperform other models in the sense that their squared error is less than that of other single models more frequently. It should be recognized that while certain models may be superior in a certain sense for a particular period and variable, combinations of forecasts are often significantly more skillful than a single model alone. In fact, the multimodel mean significantly outperforms all single models.

Comparing Forecast Skill

Journal

MONTHLY WEATHER REVIEW

Publisher

AMER METEOROLOGICAL SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Comparing Forecast Skill

Journal

MONTHLY WEATHER REVIEW

Publisher

AMER METEOROLOGICAL SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper