RE: Odd critique of statistical tools
Very good comments. Let's take them one at a time, and in order.
To be sure, dynamical modelers must also make subjective decisions, such as about which processes are to be included in the model. Presumably, scale analyses can help determine the relative priorities. Also, not everything can be included explicitly, and some important processes (like tropical convection) must be abbreviated using statistical parameterization schemes. In that sense, the choice of which physics to include, and how to include it, in dynamical models parallels the choice of the criteria for matching in analog forecasting. However, the way in which analogs are composited into a single forecast is likely to be more crude than the way in which a full ensemble of model runs is used for a climate forecast. In the example shown here, only 6 past cases were selected, and each of them had obvious deviations from the current case over the selection period. Most climate model ensemble runs manage to include a lot more than 6 ensemble members. Perhaps a more important difference between the two methods is the level of explicitness in representing the physics in dynamical models, compared with the black box style of making an analog forecast. Although this in itself does not necessarily mean the long-term forecast skill will be higher in dynamical models, it does mean that part of the forecast job is nailed down in a more precise manner by representing the underlying physics exactly. Yet the bottom line skill averages are what really determine the relative worth of the two approaches. So let's look at that. Since analog methods have generally not continued as a current method to develop climate forecasts, it is hard to compare their skill with that of today's leading dynamical models. It is assumed that the dynamical models deliever higher skill, on average, than analog systems. Perhaps this needs to be demonstrated in a formal way, and of course it is possible that the analog approach would still match the dynamical approach, or that the skill difference would not be statistically significant. But I tend to doubt it. In a recent study comparing the real-time forecasts of ENSO forecast models of different types since 2002, the dynamical models tended to have highest skills. (See "Skill of real-time seasonal ENSO model predictions during 2002-11: Is our capability increasing?" by A. G. Barnston, M. K. Tippett, M. L. L'Heureux, Michelle L.; etc. in Bulletin of the AMS, 93, 631-651.)
My comparison between the analog method and more sophisticated statistical method was in the context of the simple analog method used here, where only SST observations over the previous year were used, and where only about 60 years of record are available so that only 6 cases passed as being at least minimally similar toe current case. The lack of enough close matches (or even a single VERY close match) is emphasized. Basing a forecast on a limited set of analogs does allow for some nonlinearity to enter into the forecast, but the luck of the draw on the selected similar cases leads to questionable stability in the forecast implications. In the case of Huug van den Dool's constructed analog, ALL years in the available record are used and given weights (positive or negative), so that the method becomes somewhat like a linear regression method, showing a bridge between analogs and multiple linear regression. Unless the nonlinear component in seasonal climate predictability is substantial, which it has not been shown to be, I favor regression (using all past years of data) to using a limited set of analogs to try to capture nonlinearity in addition to the linear components of the variability. I believe that in using a small set of analogs, sampling variabiity usually outweighs the beneficial incorporation of nonlinearity. This is why I think that more traditional statistical methods are likely to have higher skills than a limited set of analogs, none of which match the current case extremely well.
About the set of analog-derived forecasts shown here for 2014-2015, just because it currently looks similar to the spread of the NCEP CFSv2 model doesn't mean it is as good a method. To compare the skills of the two systems, a hindcast test covering all years (hopefully over 30) would need to be conducted. A single case rarely tells us very much about what level of skill to expect over a long term.