Description:
This work is concerned with evaluating the performance of forecasts. Various types of forecast are studied: probabilistic forecasts, which take the form of a predictive distribution, and point forecasts. We focus on assessing the calibration of forecasts, which refers to the statistical compatibility between forecasts and realised observations.
We generalise the definition of the calibration of continuous predictive distributions. Our generalisation includes existing modes of calibration, such as marginal calibration and probabilistic calibration, as special cases, and introduces new calibration modes. In addition, our generalisation allows us to assess calibration conditional on interesting random variables such as different situations, regions, and seasons. This provides more details about calibration in subsets of forecasts than assessing calibration for the entire set of forecasts. We introduce measures of calibration in our generalisation by decomposing proper scores. Our decompositions provide novel measures of marginal calibration and probabilistic calibration as special cases.
In addition to considering probabilistic forecasts, we consider the calibration of point forecasts in terms of functionals of the predictive distribution. We define the calibration of functionals of predictive distributions and we propose a general approach to producing criteria for assessing conditional calibration of identifiable functionals such as moments and quantiles. For non-identifiable functionals, we consider only the variance, which is a representation of forecast uncertainty. We derive criteria for the conditional calibration of the variance that do not require assuming a calibrated mean or correcting the bias in the mean.
To assess the calibration of forecast means, we also produce a diagnostic graph using local linear regression. We suggest a novel bootstrap approach to construct- ing confidence intervals of the conditional mean that takes heteroscedasticity and autocorrelation into account, and conduct a simulation study to investigate the empirical coverage of these intervals. We use a real data example of forecasts of El Nino–Southern Oscillation to illustrate our methods.