4.4 Article

Comparing Forecast Skill

Journal

MONTHLY WEATHER REVIEW
Volume 142, Issue 12, Pages 4658-4678

Publisher

AMER METEOROLOGICAL SOC
DOI: 10.1175/MWR-D-14-00045.1

Keywords

-

Funding

  1. National Oceanic and Atmospheric Administration under Climate Test Bed program [NA10OAR4310264]
  2. National Science Foundation [ATM0332910, ATM0830062, ATM0830068]
  3. National Aeronautics and Space Administration [NNG04GG46G, NNX09AN50G]
  4. National Oceanic and Atmospheric Administration [NA04OAR4310034, NA09OAR4310058, NA10OAR4310210, NA10OAR4310249, NA12OAR4310091]
  5. Office of Naval Research Award [N00014-12-1-091]
  6. NOAA
  7. NSF
  8. NASA
  9. DOE

Ask authors/readers for more resources

A basic question in forecasting is whether one prediction system is more skillful than another. Some commonly used statistical significance tests cannot answer this question correctly if the skills are computed on a common period or using a common set of observations, because these tests do not account for correlations between sample skill estimates. Furthermore, the results of these tests are biased toward indicating no difference in skill, a fact that has important consequences for forecast improvement. This paper shows that the magnitude of bias is characterized by a few parameters such as sample size and correlation between forecasts and their errors, which, surprisingly, can be estimated from data. The bias is substantial for typical seasonal forecasts, implying that familiar tests may wrongly judge that differences in seasonal forecast skill are insignificant. Four tests that are appropriate for assessing differences in skill over a common period are reviewed. These tests are based on the sign test, the Wilcoxon signed-rank test, the Morgan-Granger-Newbold test, and a permutation test. These techniques are applied to ENSO hindcasts from the North American Multimodel Ensemble and reveal that the Climate Forecast System, version 2, and the Canadian Climate Model, version 3 (CanCM3), outperform other models in the sense that their squared error is less than that of other single models more frequently. It should be recognized that while certain models may be superior in a certain sense for a particular period and variable, combinations of forecasts are often significantly more skillful than a single model alone. In fact, the multimodel mean significantly outperforms all single models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available