A Debiased Ranked Probability Skill Score to Evaluate Probabilistic Ensemble Forecasts with Small Ensemble Sizes. Müller, W., A., Appenzeller, C., Doblas-Reyes, F., J., & Liniger, M., A.
A Debiased Ranked Probability Skill Score to Evaluate Probabilistic Ensemble Forecasts with Small Ensemble Sizes [pdf]Paper  abstract   bibtex   
The ranked probability skill score (RPSS) is a widely used measure to quantify the skill of ensemble forecasts. The underlying score is defined by the quadratic norm and is comparable to the mean squared error (mse) but it is applied in probability space. It is sensitive to the shape and the shift of the predicted probability distributions. However, the RPSS shows a negative bias for ensemble systems with small en-semble size, as recently shown. Here, two strategies are explored to tackle this flaw of the RPSS. First, the RPSS is examined for different norms L (RPSS L). It is shown that the RPSS Lϭ1 based on the absolute rather than the squared difference between forecasted and observed cumulative probability distribution is unbiased; RPSS L defined with higher-order norms show a negative bias. However, the RPSS Lϭ1 is not strictly proper in a statistical sense. A second approach is then investigated, which is based on the quadratic norm but with sampling errors in climatological probabilities considered in the reference forecasts. This technique is based on strictly proper scores and results in an unbiased skill score, which is denoted as the debiased ranked probability skill score (RPSS D) hereafter. Both newly defined skill scores are independent of the ensemble size, whereas the associated confidence intervals are a function of the ensemble size and the number of forecasts. The RPSS Lϭ1 and the RPSS D are then applied to the winter mean [December–January–February (DJF)] near-surface temperature predictions of the ECMWF Seasonal Forecast System 2. The overall structures of the RPSS Lϭ1 and the RPSS D are more consistent and largely independent of the ensemble size, unlike the RPSS Lϭ2 . Furthermore, the minimum ensemble size required to predict a climate anomaly given a known signal-to-noise ratio is determined by employing the new skill scores. For a hypothetical setup comparable to the ECMWF hindcast system (40 members and 15 hindcast years), statistically significant skill scores were only found for a signal-to-noise ratio larger than ϳ0.3.

Downloads: 0