S2S reboot: An argument for greater inclusion of machine learning in subseasonal to seasonal forecasts

The discipline of seasonal climate prediction began as an exercise in simple statistical techniques. However, today the large government forecast centers almost exclusively rely on complex fully coupled dynamical forecast systems for their subseasonal to seasonal (S2S) predictions while statistical techniques are mostly neglected and those techniques still in use have not been updated in decades. In this Opinion Article, we argue that new statistical techniques mostly developed outside the field of climate science, collectively referred to as machine learning, can be adopted by climate forecasters to increase the accuracy of S2S predictions. We present an example of where unsupervised learning demonstrates higher accuracy in a seasonal prediction than the state‐of‐the‐art dynamical systems. We also summarize some relevant machine learning methods that are most applicable to climate prediction. Finally, we show by comparing real‐time dynamical model forecasts with observations from winter 2017/2018 that dynamical model forecasts are almost entirely insensitive to polar vortex (PV) variability and the impact on sensible weather. Instead, statistical forecasts more accurately predicted the resultant sensible weather from a mid‐winter PV disruption than the dynamical forecasts. The important implication from the poor dynamical forecasts is that if Arctic change influences mid‐latitude weather through PV variability, then the ability of dynamical models to demonstrate the existence of such a pathway is compromised. We conclude by suggesting that S2S prediction will be most beneficial to the public by incorporating mixed or a hybrid of dynamical forecasts and updated statistical techniques such as machine learning.

The discipline of seasonal climate prediction began as an exercise in simple statistical techniques. However, today the large government forecast centers almost exclusively rely on complex fully coupled dynamical forecast systems for their subseasonal to seasonal (S2S) predictions while statistical techniques are mostly neglected and those techniques still in use have not been updated in decades. In this Opinion Article, we argue that new statistical techniques mostly developed outside the field of climate science, collectively referred to as machine learning, can be adopted by climate forecasters to increase the accuracy of S2S predictions. We present an example of where unsupervised learning demonstrates higher accuracy in a seasonal prediction than the state-of-the-art dynamical systems. We also summarize some relevant machine learning methods that are most applicable to climate prediction. Finally, we show by comparing real-time dynamical model forecasts with observations from winter 2017/2018 that dynamical model forecasts are almost entirely insensitive to polar vortex (PV) variability and the impact on sensible weather. Instead, statistical forecasts more accurately predicted the resultant sensible weather from a mid-winter PV disruption than the dynamical forecasts. The important implication from the poor dynamical forecasts is that if Arctic change influences mid-latitude weather through PV variability, then the ability of dynamical models to demonstrate the existence of such a pathway is compromised. We conclude by suggesting that S2S prediction will be most beneficial to the public by incorporating mixed or a hybrid of dynamical forecasts and updated statistical techniques such as machine learning. It is estimated that 2.7 trillion dollars of the U.S. economy alone is sensitive to the impacts of weather and climate (National Oceanic and Atmospheric Administration, 2002). Improving our ability to forecast the weather and climate is of interest to all sectors of the economy and government agencies from the local to the national level. In recent years, seasonal climate predictive skill associated with ENSO, a question remains-what are the prospects for skill improvement in the extratropical latitudes where a large fraction of climate variability still remains unpredictable? Studies of the comparable skill between ENSO seasonal forecasts made by dynamical and statistical models found that, in general, dynamical models outperformed statistical models (Barnston et al., 2012). This is expected given that the focus of effort and resources has been on dynamical forecast systems while statistical models have been mostly neglected. And while dynamical models are constantly improved and updated, statistical models have basically remained unchanged from the 1990s and even the 1980s (Barnston et al., 2012). An influential National Academies of Science report recommended improvements on subseasonal to seasonal (S2S) predictions that were exclusively directed at dynamical prediction systems including data assimilation, parameterization and multimodel approaches, while statistical models were ignored. The report recommended that "To summarize, investment in research aimed at physical understanding and reducing (dynamical) model errors is seen by this committee as a top priority for improving the skill of S2S predictions." (National Academies of Sciences, Engineering, and Medicine, 2016) In this Perspective, we argue that statistical methods still have an important place in S2S forecasting. Although development of new statistical techniques has not been a focus of the climate community, new statistical techniques have been developed and utilized in other disciplines to manipulate large datasets. We argue that these new statistical techniques, often referred to collectively as "machine learning," improve on traditional statistical techniques with higher forecast accuracy. Recent work shows that these new statistical techniques are often more accurate than the latest generation of dynamical coupled atmosphere-ocean models on S2S timescales. We present two examples where newly developed statistical techniques demonstrate higher skill in hindcasts than both traditional CCA and dynamical models.
In addition, statistical forecasts have the advantage of utilizing continuously acquired knowledge obtained from data analysis of climate variability, which can readily be applied (Doblas-Reyes et al., 2013). In testing the utility of a boundary forcing such as sea ice or snow cover in improving forecast skill, using a statistical model is much easier and faster to employ and implement than a dynamical model. Since statistical models are computationally more expedient than GCMs, they can be used to efficiently search for "windows of predictability (i.e., the ability to predict)," that is, regions, states or timescales that are linked to low frequency processes with better predictive skill. And although dynamical models incorporate boundaries such as ocean variability, snow cover, sea ice, soil moisture and even the stratosphere, model errors could mask potential skill from these possible forcings.
We further argue that even traditional statistical models can be used to uncover weaknesses and guide improvements in dynamical model simulations. Observational analysis has supported that Arctic variability influences mid-latitude weather while most modeling studies have supported mostly no relationship or that only a weak relationship exists (Cohen, Screen, et al., 2014;Overland et al., 2015;Vihma, 2014). Additionally, modeling studies have been used to criticize the observational analysis (Cohen, Screen, et al., 2014). We show, however, that statistical model forecasts using Arctic predictors outperformed dynamical models in predicting winter 2017/2018 surface temperature anomalies and exposing serious shortcomings of dynamical models and an inherent insensitivity to Arctic forcings.

| DATASETS
In this study, we used several precursor fields in autumn (September, October, November [SON]), including monthly sea ice concentration (SIC) from the Met Office Hadley Centre at 2.5 latitude X 2.5 longitude (Rayner et al., 2003) and the domain for the SIC is 60 -90 N latitude and 0 W-180 E longitude and for the time period of 1965-2015. We further include monthly snow cover extent (SCE) from the Rutgers Snow Lab with data provided by the National Oceanographic and Atmospheric Administration (NOAA; Robinson, Estilow, & Program, 2012) using the area between 20 -60 N latitude and 0 -180 E longitude. In addition, we include monthly SST from the Met Office Hadley Centre at 1 × 1 (Rayner et al., 2003) over three different regions: the North Atlantic (10 -70 N latitude and −110 W-20 E longitude), the Mediterranean (25 -50 N and 6 W-45 E) and the Pacific region (30 S-30 N and 150 E-70 W). In the Atlantic region SST, we masked portions corresponding to the Pacific Ocean, Mediterranean as well as Hudson Bay. Similarly, we masked the Atlantic region in the selected Pacific Ocean precursor region. Finally, we included as atmospheric precursors the geopotential height at 500 hPa over 20 -80 N and 10 E-180 E and sea level pressure (SLP) over 40 -80 N and 0 -360 from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis available four times daily at a global resolution at 2.5 × 2.5 resolution (Kalnay et al., 1996). We averaged daily values to create both monthly and seasonal averages. All regions are selected by examining the significance of precursor composites as calculated by a bootstrap method.
For the winter European temperature forecast we used monthly averaged temperature anomalies on a 0.5 × 0.5 grid provided by the NOAA (Fan & van den Dool, 2008). We used the same area as for the precipitation set (25 -75 N latitude and 20 W-45 E longitude). In addition, we included a greater and shifted SST area for the North Atlantic region (5 S-50 N and 90 -6 W) to account for the North Atlantic Oscillation (NAO) pattern, and excluded the Mediterranean region for the temperature anomaly forecast. For Northern Hemisphere surface temperature data we used the NCEP/NCAR reanalysis available four times daily at a global resolution of~1.9 (Kalnay et al., 1996). We averaged daily values to create both monthly and seasonal averages.

| FORECAST METHODS
In the paper we compare the hindcast skill of three forecast methodologies. The first method is CCA, which is the traditional statistical method employed by forecast centers for long range weather or climate prediction. We also use a new method of statistical prediction based on hierarchical clustering analysis developed by Totz, Tziperman, Coumou, Pfeiffer, and Cohen (2017). The advantage of the clustering-based forecasting method over index-based regression models (e.g., Cohen & Fletcher, 2007) is that it accounts not only for the amplitude of the predictors but also for the geographical distribution using clustering techniques. Its advantage over CCA, which is based on principal component analysis is that, unlike principal components, the clusters do not need to be orthogonal and therefore represent variability patterns more faithfully. Further details on the hierarchical clustering analysis are provided in Supporting Information File S1. A complete description of how hierarchical clustering techniques can be used for forecasting is described in Totz et al. (2017).
We also present a real-time monthly and seasonal forecast based on multilinear regression (Cohen & Fletcher, 2007). The linear regression model is not novel of course but does use mostly Arctic predictors for the winter forecast. The major predictors for the winter are predicted winter ENSO (Niño 3.4 index), September Arctic SIC, October Eurasian SCE and October SLP anomaly in northwestern Asia. No detrending is applied to the data in producing the forecast. Three of the four predictors are high latitude variables with the only tropical predictor being ENSO. Cohen and Fletcher (2007) demonstrated improved hindcast temperature skill over dynamical model hindcast skill using linear regression and Arctic predictors. However, the variance of the model is damped relative to the observations. In those regions where the model has higher skill such as the eastern United States and northern Eurasia, the model variance is 70-80% of the observed values. This model is run operationally prior to each winter and the forecast is posted to the National Science Foundation website (https://www.nsf.gov/news/special_ reports/autumnwinter/predicts.jsp). Subsequent studies have supported the use of SIC and SCE as predictors for wintertime temperature anomalies (Furtado, Cohen, Butler, Riddle, & Kumar, 2015;García-Serrano et al., 2016;Gastineau, García-Serrano, & Frankignoul, 2017). Since the model utilizes October data, a preliminary forecast is usually generated the end of October with the forecast updated the first week of November. The forecast for winter (December to February) 2017/2018 was created November 8, 2017.
We show the winter temperature anomaly forecast for the Northern Hemisphere from winter 2017/2018. The statistical forecast is compared to the NMME suite of models. In addition, we also compare to the real-time forecasts from an international suite of models including the CFSv2 (climate forecast system version 2), the European Centre for Medium-range Weather Forecasting (ECMWF) model (Stockdale et al., 2011), the UK Met Office Hadley Center Unified Model (Walters et al., 2017), and the MeteoFrance model (Voldoire et al., 2013). These four models are referred to as the International Multi-Model Ensemble (IMME). In the present work, we evaluated NMME and IMME forecasts from November for the December, January and February period, and from December for the January, February and March period.

| ALTERNATIVE STATISTICAL LEARNING APPROACHES
In this Perspective article, we bring examples of statistical techniques applied to S2S prediction using clustering analysis and multilinear regression. However, there are other examples of new techniques of supervised learning often referred to collectively as "machine learning" that may be applied and possibly provide improved forecast skill to operational S2S predictions. These techniques are briefly described in File S1. In the following section, we present hindcast skills from an unsupervised learning technique known as hierarchical clustering.

| EUROPEAN WINTER PRECIPITATION HINDCASTS
In a recent paper (Totz et al., 2017), we compared hindcast skill of winter (December-January-February) precipitation for the European and Mediterranean regions for CCA, hierarchical clustering and from the NMME dynamical models. In Totz et al. (2017), the cross validated correlation of the cluster-based hindcasts with observations for the years 1967-2016 were shown. Also shown were the cross-validated correlations of the CCA-based hindcasts with observations for the years 1967-2016. In addition, the cross-validated correlations were shown for hindcasts for the years 1982-2010 from the NMME dynamical models. Finally, to make a direct comparison with the NMME, the cross-validated correlation of the cluster-based hindcasts with observations for the years 1982-2010 were shown as well.
The hierarchical clustering used four fall (three-month average of September, October, and November) predictors for predicting winter precipitation in the European and Mediterranean regions: SCE across Eurasia, SIC in the Arctic and SST in the North Atlantic and Mediterranean regions. The conclusion of that study was that the forecasts based on hierarchical clustering had higher skill than both the more traditional CCA statistical model and the NMME suite of dynamical models. Here we include a similar figure of forecast skill slightly modified from the techniques presented in Totz et al. (2017) (see Figure 1). The hierarchical clustering exhibits improved skill in representing the overall pattern of precipitation anomalies and in the skill score of individual locations throughout the region. Totz et al. (2017) only applied hierarchical clustering to precipitation forecasts. We performed a similar analysis for temperature forecasts again for the European sector. We used hierarchical clustering and the Ward method to identify patterns of winter temperature variability and found six dominant clusters ( Figure 2). The first cluster shows east-west temperature variability, whereas Clusters 2 and 5 exhibit homogenous warm and cold temperature patterns, respectively. Clusters 3 and 4 exhibit north-south temperature variability that is the classic signature of the NAO. The last cluster exhibits positive temperature anomalies over northern Europe and negative temperature anomalies in southern Europe. We then matched the six clusters with seven fall predictors for predicting winter temperature in the European and Mediterranean regions: SCE across Eurasia, SIC in the Arctic, SST in the North Atlantic region, Mediterranean region and Pacific region as well as geopotential height over Eurasia and SLP over the mid-to high-latitudes of the Northern Hemisphere. We implement a few improvements relative to Totz et al. (2017) as follows. First, we remove the mean of each precursor using all data except for the predicted year. Next, we normalize each precursor by its standard deviation. Finally, we combine different precursors into a single vector. In addition, we define a threshold for the calculation of the singular value decomposition-based pseudo-inverse used to calculate the prediction coefficients such that singular values that are smaller than 1% of the largest singular value are set to zero.

| WINTER SURFACE TEMPERATURE HINDCASTS
All hindcasts were cross-validated, that is, the precursors for the year hindcasted were not included in building the regression coefficients used for the temperature hindcast of that year. We calculated the mean cross-validated skill using all possible configurations of the seven precursors: using all possible seven single precursors, using all possible pairs of precursors chosen out of the possible seven, etc. There is a total of 127 models based on these different precursor choices. Out of these, 16 show a skill higher than 0.1, 12 a skill higher than 0.15, and 12 higher than 0.2. Note that the NMME skill is 0.02. Thus the clustering-based forecast has an improved forecast skill that is robust to model details. We find that the mean skill has the highest value (0.21) using the three predictors: Atlantic SST, Pacific SST, and geopotential height over Europe. These precursors were also used to compare the spatial skill over Europe with CCA and NMME ( Figure 3). Our results once again demonstrate that forecasts based on hierarchical clustering had higher skill than both the more traditional CCA statistical model and the NMME suite of dynamical models. The hierarchical clustering exhibits improved skill in representing the overall pattern of surface temperature anomalies and in the skill score of individual locations throughout the region.
We are currently applying hierarchical clustering to temperature forecasts and across North America also in winter. Results are not available at the time of submission of this manuscript but we are preparing analysis for a future publication.

| NORTHERN HEMISPHERE WINTER SURFACE TEMPERATURE FORECAST -WINTER 2018
Government operational forecast centers almost exclusively rely on dynamical coupled forecast systems in generating and disseminating seasonal forecast products with some notable exceptions (e.g., the Indian Meteorological Department uses statistical methods for monsoon forecasts). In contrast, AER produces an operational seasonal forecast using a statistical model (Cohen & Fletcher, 2007).
In Figure 4 we show two dynamical, December to February 18, 2017, forecasts from the NMME suite of dynamical models and the IMME suite of dynamical models to compare with the AER seasonal forecast model (the Northern Hemisphere winter forecast was posted on November 27, 2017: https://www.aer.com/siteassets/ao-archives/ao-update-27-nov-17.pdf). In Figure 4 we also show the January to March 2018 forecasts from the NMME suite of dynamical models, the IMME suite of dynamical models and the AER seasonal forecast model.
All the dynamical models predict above normal, to even well above normal, temperatures across the Eurasian continent. The forecasts across the United States were for normal to above normal temperatures but the dynamical models did predict a FIGURE 2 Clusters of temperature winter (December, January, and February) anomalies for the European region, ordered by their frequency: Cluster (a) has a frequency of 32%, (b) a frequency of 20%, (c) a frequency of 18%, (d) a frequency of 14%, (e) a frequency of 8%, and (f) a frequency of 8% more regional area of below normal temperatures in southeastern Alaska and northwestern Canada. The cold temperature forecasts in Alaska, Canada, and the Pacific Northwest are likely related to the predicted La Niña for winter 2017/2018. Correlations of ENSO with surface temperatures show a positive correlation in western North America ( Figure 5), that is, La Niña favors cold temperatures in southeast Alaska, western Canada, the Pacific northwest and east towards the Canadian Plains.
In contrast to the dynamical model forecasts, the statistical model forecast predicted normal to below normal temperatures for northern Asia, East Asia, and northern Europe. And in North America, the model predicted normal to below normal temperatures for western Canada into the upper midwest of the United States, the Great Lakes and most of the eastern United States. The model also predicted above normal temperatures for the Mediterranean region, the Middle East, southern Asia, the North American Arctic and the southwestern United States. The statistical model based on Arctic predictors correctly predicted cold across large parts of Asia, including east Asia, western Europe, central Canada and the upper midwest of the United States. The predictors that were the biggest contributors to the cold forecast across northern Eurasia were October SLP anomalies and September SIC. The biggest contributors to the cold forecast in the eastern United States were October SLP anomalies and October SCE and in the western United States it was the predicted La Niña. These are regions that were not predicted to be cold across the suite of dynamical models. Cold temperatures were more widespread across the Northern Hemisphere mid-latitudes in the latter half of winter consistent with previous studies, which argued that cold temperatures in the era of Arctic amplification have become more dependent on polar vortex (PV) disruptions that typically occur in mid-to late-winter (Cohen, Barlow, & Saito, 2009;Cohen, Pfeiffer, & Francis, 2018). In addition to the more accurate cold temperature forecast across the continents, the statistical model predicted more amplified warming across the Arctic and closer to the observed compared to the dynamical model forecasts.
The cold forecast by the statistical model but absent in the dynamical models suggests the more accurate cold forecast based on the statistical model is related to Arctic variability and is missing from the dynamical models. The more accurate winter forecast using Arctic variables is supportive of a recent study that showed Northern Hemispheric atmospheric trends are more consistent with Arctic trends than with tropical trends (Cohen, 2016). In this regard, the statistical model can be used to inform or guide improvements in the dynamical models. This argument of the limited ability of dynamical models to simulate Arctic or high latitude processes and coupling with the atmosphere is discussed in greater detail on the subseasonal scale. In addition to the overall winter seasonal forecast in winter 2017/2018, much attention was given to a significant PV disruption that resulted in a PV split and widespread severe winter weather across the Northern Hemisphere starting in mid-February including record cold and disruptive snowfalls both in Europe and the United States (https://mashable. com/2018/02/15/polar-vortex-split-stratospheric-warming-snow-cold-europe-us/#nMlfNYJ2qmqn). There was also some celebration in the climate community on how well the event was predicted in advance. On February 12, an ongoing PV disruption officially achieved major mid-winter warming (MMW; defined as a reversal of the zonal wind from westerly to easterly at 60 N and 10 hPa) status. The PV split into two pieces and was one of, or possibly the most extreme, PV disruptions observed in the month of February in the satellite era (since 1979) as defined by the magnitude of the easterly wind at 60 N and 10 hPa and the anomalously warm temperatures in the polar stratosphere (averaged between 60-90 N and 10 hPa).  It has been known for some time that following MMWs, the Northern Hemisphere atmospheric circulation is altered up to 2 months including a shifted storm track, storm frequency and temperature anomalies (Baldwin & Dunkerton, 2001;Kolstad, Breiteig, & Scaife, 2010). It has also been shown that following PV disruptions, the Arctic Oscillation is predominantly in the negative phase (Baldwin & Dunkerton, 2001), which favors increased snowfall across Europe and the northeastern United States (Cohen, Ye, & Jones, 2015). Following the MMW during the second week of February 2018, temperatures turned much colder across Europe along with disruptive snowfalls that continued through late March. In North America, initially temperatures turned colder and storminess increased across the western portion of the continent but with time the cold air and storminess transitioned to the eastern United States. The northeastern United States experienced four nor'easters in relative quick succession, each with heavy snowfall reported (50-100+ cm) during the first 3 weeks of March.
Long lead forecasts of PV disruptions can benefit regional and local municipalities and businesses with advanced warning to prepare for possible inclement winter weather. A forecast based on statistical techniques predicted a significant PV disruption nearly a month in advance. In a weekly blog postdated January 15, 2018, the first author wrote "Our PV forecast model predicts that the PV disruption will peak the second week of February." Based on the expected PV disruption, the first author further wrote "This anticipated stratospheric disruption is likely to differ from the stratospheric PV disruption in late December where the resultant below normal temperatures were focused across North America while much of Eurasia remained relatively mild….I expect the focus of the resultant cold temperatures to be across northern Eurasia, especially Siberia. Temperatures would likely average below normal across much of Siberia and likely elsewhere across northern Eurasia." (The full blog is posted here: https://www.aer.com/siteassets/ao-archives/ao-update-15-jan-18.pdf) In contrast, dynamical predictions of the PV disruption only came two or more weeks later. A New York Times Op-Ed celebrated as an achievement, the forecast in early February of the PV disruption by dynamical models ("Both the sudden stratospheric warming and the Arctic warming were forecast with outstanding accuracy" from Hot Times in the Arctic March 14, 2018). Support for the outstanding forecasts was a discussion from February 3, 2018 (https://www.nymetroweather. com/2018/02/03/sudden-stratospheric-warming-increasingly-likely-mean/). A NOAA blog also argues that the numerical weather prediction models began to simulate the MMW in the late January and early February time frame (Butler, 2018). But the PV would split just a week later and achieve MMW status less than 10 days later. In addition, the greatest poleward heat flux into the polar stratosphere ever observed occurred that week and was well-predicted by the models making a PV disruption all but inevitable just a few days or a week later. Given the fewer degrees of freedom in the stratosphere, the lack of topography and diabatic heating sources and the large wavelengths of the atmospheric circulation, it is questionable whether a correct forecast of an extreme stratospheric event only 1 week to 10 days in advance should be celebrated as an outstanding achievement.
Based on Arctic variability the expectation of a significant PV disruption was noted even months earlier. The first author in a blog dated November 20, 2017 predicted a significant PV disruption in the late winter (the full blog is posted here: https://www.aer.com/siteassets/ao-archives/ao-update-20-nov-17.pdf). This prediction was based on below normal November Arctic SIC, above normal Eurasian October SCE, an easterly quasi-biennial oscillation (QBO) and a slightly negative snow advance index (Cohen & Jones, 2011a). The forecast for the winter surface temperature anomalies were included as well. The reasoning was partially based on two recent studies demonstrating that above normal Eurasian SCE coupled with an easterly  (Peings, Douville, Colin, Martin, & Magnusdottir, 2017;Tyrrell, Karpechko, & Räisänen, 2018).
The correct forecast of a significant PV disruption in late winter issued both in the fall and in mid-January was based on Arctic or high latitude variability and applied to statistical forecasts. The expectation of a significant PV disruption in November was based on Arctic sea ice and snow cover anomalies. However, the forecast in mid-January of a significant PV disruption the second week of February was mostly based on SLP anomalies across the high latitudes and surface temperatures anomalies across northern Asia. It has been previously shown that a tripole pattern in SLP anomalies with positive SLP anomalies across northwestern Eurasia, centered near the Urals, and negative SLP anomalies both upstream and downstream in the North Atlantic and the North Pacific, respectively, is favorable for disrupting the PV Cohen & Jones, 2011b). Comparison of January 2018 SLP anomalies and the SLP tripole pattern ) favorable for PV disruptions shows a strong similarity ( Figure 6). Furthermore, a statistical model using SLP anomalies to predict the strength of the PV predicted a weakening of the PV that peaks the second week of February, a month in advance. Further support for a significant PV disruption were the cold temperatures across northern Asia during January. Kretschmer et al. (2018) showed that cold temperatures across northern Asia exist prior to significant PV disruptions. This is an example of using statistical techniques and Arctic variability provided a longer lead time to predict a significant PV disruption than dynamical model forecasts.
Besides Arctic variability, tropical variability has been proposed as a forcing/predictor of PV behavior (Butler & Polvani, 2011). In addition to phases of ENSO possibly forcing PV disruptions, the Madden-Julian Oscillation (MJO) has been proposed as a possible forcing/predictor of PV behavior. Specifically, observational analysis showed that there is an increased frequency to PV disruptions between 25 and 36 days following MJO Phase 3 and there is an increased frequency to PV disruptions between 1 and 12 days following MJO Phase 7 (Garfinkel, Feldstein, Waugh, Yoo, & Lee, 2012). Consistent with this study, the MJO was in Phase 3 in mid-January and in Phase 7 in early February. Therefore, it is possible that not only did Arctic/high latitude variability contribute to the PV disruption the second week of February but so did tropical convection. Statistical or empirical techniques using high latitude and tropical variability would have provided up to a month lead forecast of a PV disruption in early to mid-February.
Of course no one lives in the stratosphere and a successful forecast of stratospheric variability is only of importance if it results in an improved forecast of the sensible weather. For society the importance of the PV disruption is that it initiated an increase in severe winter weather, including cold temperatures and disruptive winter storms across the Northern Hemisphere including Europe and the United States for the following 3 months. It has been shown that for PV disruptions of the magnitude observed in February 2018, the temperature response is below normal temperatures widespread across northern Eurasia including Europe but with the largest negative temperature anomalies focused in Siberia (Kretschmer et al., 2018). We show in Figure 7 the monthly temperature anomaly forecasts from the suite of dynamical models for the 2 months when temperature anomalies responded to the PV disruption-February and March. Both the American suite of dynamical models (NMME) and the international suite of models (IMME) predict overall above normal temperatures across all of Eurasia and the United States with the only cold predicted in western Canada, which as discussed with Figure 5 is likely associated with the observed winter La Niña.
We also include the observed temperature anomalies from February to March 2018. In contrast to the model forecasts, the observations show widespread cold temperatures for both months across northern Eurasia including Europe. The dynamical models are initialized with observations a month in advance, therefore, it is unlikely that the models were correctly predicting a PV disruption in early February for the February forecast initialized on January 1. However, for the March forecast initialized on February 1, the models should have correctly simulated a significant PV disruption as discussed above. Yet there is no temperature response to the PV disruption reflected in the March dynamical forecasts, which is nearly identical to the February anomaly temperature forecasts, in sign and in pattern. This is consistent with previous model analysis demonstrating that the circulation anomalies associated with PV disruptions fail to propagate from the stratosphere into the troposphere as seen in the   (Furtado et al., 2015). It does appear from comparing dynamical model temperature anomaly forecasts with the observed temperature anomalies, that it is almost irrelevant to the models whether a PV disruption occurs or not. Or, in other words, the dynamical model's tropospheric response is nearly insensitive to PV disruptions and by extension, if Arctic variability influences mid-latitude weather through altering PV behavior, then the models are insensitive to Arctic variability. This lack of sensitivity to Arctic variability needs to be considered as a serious shortcoming of the dynamical models that limits the capability of dynamical models accurately predicting Northern Hemisphere winter temperature variability.
We also include statistical forecasts for the months of February and March generated in the fall. The forecasts include no information about the PV disruption in early February but do include the Arctic variables September Arctic SIC and October Eurasian SCE. Previous studies link both below normal Arctic SIC and above normal October Eurasian SCE to a PV disruption in mid-to late-winter followed by widespread cold temperatures across the Northern Hemisphere continental midlatitudes (Furtado, Cohen, & Tziperman, 2016;García-Serrano et al., 2016;Gastineau et al., 2017). This expectation of a PV disruption late in the winter was published on the AO blog and on the National Science Foundation website (https://www.nsf. gov/news/special_reports/autumnwinter/predicts.jsp). Although the statistical model forecasts are not perfect, the model did correctly predict below normal temperatures across northern Eurasia for both March and especially February, and supports the idea that the below normal temperatures across northern Eurasia are related to Arctic variability.
Finally, from observational analysis it can be concluded that Arctic variability contributed to the PV disruption of February 2018 or that MJO variability contributed to the PV disruption of February 2018. From the dynamical model forecasts it can be concluded that both Arctic and tropical variability are not significant contributors to mid-latitude variability but one cannot conclude that tropical variability is a significant contributor to mid-latitude variability while Arctic variability is not. This has important implications for the argument that dynamical models support the influence of tropical forcing but not Arctic forcing on mid-latitude weather variability.

| CONCLUSION
Over the past two decades, most of the effort and the resources at government-sponsored forecast centers have been dedicated to atmosphere-ocean coupled dynamical models. Current statistical techniques have not been updated since the 1990s and even the 1980s. An influential National Academy of Sciences report on S2S prediction recommended that new resources be dedicated to improving dynamical models but ignored statistical models, contributing to the migration away from statistical models and towards dynamical models. As a consequence, statistical techniques and models are currently mostly ignored at the government-sponsored forecast centers.
We argue that the lack of attention, resources and implementation of statistical techniques is a mistake and deprives the public of immediate and relatively inexpensive improvements to S2S prediction. As we describe above, new statistical techniques, often labeled as machine learning, have been developed that are far more powerful at mining data and recognizing patterns than traditional techniques that can be applied to delivering to the public more skillful forecasts.
As we demonstrated above using seasonal hindcasts, new statistical techniques can provide more skillful forecasts of both precipitation and surface temperature than the current state-of-the-art coupled dynamical systems. The statistical model that we used employed hierarchical clustering and included fall Arctic boundary forcings as predictors. We showed this for Europe and hope to demonstrate shortly for North America as well. We also discussed other novel machine learning techniques developed in other disciplines that may be appropriate to apply to the S2S prediction problem.
Besides demonstrating improved skill for hindcasts, we showed that statistical techniques provided a more accurate winter temperature forecast for winter 2017/2018, better representing the observed "warm Arctic/cold continents" (Cohen, Jones, Furtado, & Tziperman, 2013) pattern than the dynamical models. In addition, statistical or empirical analysis and models can be exploited to guide improvements in dynamical model development. As we discussed above, an extreme PV disruption developed in early February 2018 and achieved MMW status on February 12. Following the PV disruption, cold temperatures and disruptive snowfalls became widespread across northern Eurasia including Europe and across the United States for the next 2 months. The increase in severe winter weather is consistent with observational analysis of the atmospheric response to PV disruptions (Kretschmer et al., 2018).
In contrast to observed temperature anomalies, the dynamical models predicted a relatively mild winter for all months from December to March, both across Eurasia and much of North America. The lone region predicted to experience a cold winter was northwestern North America and is likely related to the predicted and verified La Niña. Little change is seen in the dynamical forecasts between the seasonal mean and individual months. As we showed, there is little change in the hemispheric temperature anomaly dynamical model forecast between February and March temperature anomalies despite that the dynamics of the PV disruption were included in the initialization of the PV disruption for the March forecast initialized in early February but was likely absent in the February forecast initialized in early January. Below normal temperatures became widespread across the hemisphere following the PV disruption both in February and especially March. Below normal temperatures were already present across northern Asia at the end of January and into early February. Dynamical model forecasts for March were initialized with both the cold temperatures in Asia and the beginnings of the PV disruption. Yet despite the initial conditions, the dynamical models predicted universal warmth across Eurasia for the month of March. This required both incorrectly advecting warm temperatures from the Arctic or the subtropics across the continent, which negated and reversed the cold temperatures already present, and neglecting the atmospheric circulation response to an extreme PV disruption. The possible incorrect advection of the warmth generated in the Arctic from sea ice loss too far south across the continents is consistent with a previous study showing that Arctic warming related to sea ice loss extends further south in the models relative to the observations (Cohen et al., 2013).
Therefore, in winter 2017/2018 the dynamical models predicted a similar temperature anomaly pattern on both climate and synoptic timescales. Climate timescales are dominated by boundary forcings and natural variability. Synoptic timescales are dominated by initial conditions. Yet the dynamical models failed to predict the observed warm Arctic cold continents pattern on longer climate leads in the fall and on shorter synoptic scales in early February for the February and March temperature forecasts. So whether the dynamical models were initialized with the beginnings of the PV disruption or not, they struggled to predict the temperature response to the PV disruption until mid-February (Butler, 2018), when the circulation anomalies associated with the stratospheric PV disruption descended to the troposphere ( Figure S1). The consistency between the dynamical model forecasts across timescales does not support that differences between observed and predicted temperatures can be attributed to only natural variability. The dynamical models only correctly predicted the warm Arctic cold continents pattern once they were initialized with a weakened PV in the upper troposphere.
In contrast, the observed temperature anomalies following the PV disruption closely follow the temperature response to such events derived from one of the new statistical techniques discussed above, hierarchical clustering. Also, a statistical model forced with Arctic variability better predicted the late winter hemisphere-wide cold than the dynamical models. In this regard, statistical models can help guide model developers to improve the physical processes in the coupled dynamical models, for example, high latitude processes that involve the PV.
Statistical predictions in general and machine learning techniques in particular are likely to improve subseasonal climate forecasts where there are repeatable patterns in the atmosphere and where there are fairly consistent sequences of events. We provided an example where statistical predictions can help improve temperature predictions following a PV disruption relative to forecasts that are derived solely from dynamical systems. For many PV disruptions there is an identifiable tropospheric wave pattern followed by a weakening of the PV and then cold air outbreaks across the Northern Hemisphere continents but most favored across northern Eurasia. As demonstrated above, dynamical models struggle with simulating this sequence of events. It seems reasonable that weather predictions associated with blocking events and/or periods of amplified known teleconnection patterns would also benefit from statistical techniques and machine learning.
Seasonal prediction had humble beginnings when simple and easy-to-employ statistical techniques such as persistence or composites were used to generate forecasts. During the 1980s, more sophisticated statistical techniques such as CCA were incorporated into seasonal prediction. Then, beginning in the 1990s, forecasts from dynamical models of increasing complexity were introduced into the operational production of S2S forecasts while inclusion of statistical techniques was phased out. We argue that currently the pendulum has swung to the other extreme where S2S forecasts are almost exclusively derived from the coupled dynamical systems while new techniques broadly referred to as "machine learning" that can both improve forecast skill and improve coupled dynamical systems are being ignored. The introduction of new statistical techniques into the operational forecast centers would benefit the public and ultimately the public is best served by hybrid forecasts utilizing both state-of-the-art dynamical models and updated statistical techniques.