A winter landscape photo: a group of tress are covered by snow.

White Christmas in Innsbruck?

Will we have a white Christmas in Innsbruck in 2022? As of nine days before Christmas, the answer has become quite clear. A machine-learning model that combines atmospheric science, meteorological observations and statistical expertise gave the answer how high (or low) the probability of waking up in a white landscape on December 24, 2022, is.

At this specific time of the year, many of us are interested in the question: Will we have a white Christmas? To understand how this question can be answered one first needs to understand how weather can be forecasted. Our atmosphere is a very complex and chaotic system, but is based on physical principles which are - in large parts - well understood. Starting out from the current state of the atmosphere, i.e., the current weather around the globe, the physical principles can be used to calculate the state of the atmosphere in a few hours or days. This is known as numerical weather prediction (NWP).

Nowadays, this is done using high-performance supercomputers which run NWP models to forecast the weather up to several weeks ahead. As neither the current state of the atmosphere nor all involved physical processes are known exactly and certain technical or numerical simplifications are required, these NWPs are subject to some uncertainties. This is especially true for long forecast horizons (e.g., predicting the weather nine days ahead) as errors naturally grow over time.

However, these errors can often be corrected with expert knowledge, thus improving the accuracy of the forecast. Meteorologists practice interpreting NWP output and make corrections if needed. E.g., if the NWP always predicts slightly too cold temperatures in a specific weather situation, the expert can correct the most recent forecast if the same situation happens again based on their profound knowledge about previous events. An alternative way is to use machine-learning techniques following the same idea. Based on historical NWP forecasts and observations, the algorithm learns in which situations the NWP was wrong and allows to adjust incoming forecasts accordingly if needed.

As this work is a cooperation between different departments at the Universität Innsbruck, we made use of both worlds. While Georg J. Mayr (Professor at the Department of Atmospheric and Cryospheric Sciences) manually analyzed the latest forecasts using his expert knowledge in synoptics, Achim Zeileis (Professor at the Department of Statistics) and Reto Stauffer (Assistant Professor at the Digital Science Center and the Department of Statistics) implemented the machine-learning algorithm to tackle the question.

Defining the target

To have an objective target, we defined ‘white Christmas in Innsbruck’ as having a closed snow cover with a snow depth of one centimeter or more (\(\ge 1~cm\)) at Innsbruck Airport on December 24, 0600 UTC (seven o’clock local time).

Data

For training the machine-learning model, historical NWP forecasts and observations from the most recent seven years are used, \(\pm 20\) days around Christmas (December 4 to January 13), resulting in a total sample size of \(N = 259\).

The observations come from a weather station at Innsbruck Airport, reporting snow depth at 0600 UTC on a daily basis which are turned into a binary variable. If a closed snow cover with a snow depth of \(\ge 1~cm\) is observed, the snow observation is “white”, or “no” otherwise.

The NWP forecasts are taken from the European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble prediction system (0000 UTC run). As the latest ensemble forecast run is the one from December 15 when this blog post goes online, forecasts up to \(+222\) hours (nine days and six hours) are used to be able to predict December 24, 2022, 0600 UTC.

A wide range of different variables from the NWP have been extracted such as temperatures at ground level and the lower atmosphere, cloud cover, precipitation and snowfall, or humidity. In addition, aggregated variables have been calculated covering the full forecast horizon (up to \(+222\) hours) or the three days before the forecast step of interest (\(+150\) hours to \(+222\) hours) including aggregated minima, maxima, averages, and/or sums, resulting in \(M = 92\) different input variables for the model.

Methodology: Random forests

As the method of choice, a random forest is used, a powerful and flexible, yet robust, machine-learning algorithm. To give a brief introduction to random forests: A random forest consists of a series of individual (but all slightly different) trees.

Each conditional inference tree is based on randomly selected variables from the NWP (\(15\) out of \(92\)). In each iteration it is tested which of the input variables shows the strongest dependence to the target variable. If there is at least one, a binary split is performed on the one with the highest dependency, splitting the data into two subsets resulting in two new nodes. This process is repeated until there is no more dependency between the target and the input variables, or other stopping criteria are met. The last node after each split is also called a leaf containing a series of observations which either fall into “white” or “no”. Thus, the relative frequency (or probability) observing “white” can be calculated in each leaf.

Many of these trees build a random forest. For this application, \(5000\) individual trees are used, whereof one tree is shown exemplarily below.

 

A graph exemplarily showing one conditional inference tree from the random forest with 10 leafs (terminal nodes) with corresponding probabilities

 

Once the random forest is estimated, variable importance can be calculated by performing a random permutation test. Without going into details: The more important a variable, the stronger the accuracy of the random forest decreases. The image below shows the mean decrease in accuracy of the 15 most important variables of the estimated random forest.

 

A graph showing the mean decrease in accuracy of the 15 most important variables of the estimated random forest in form of a Christmas tree with green horizontal bars and a yellow star on top

 

 

Once a new NWP forecast becomes available, we can predict the outcome by putting the new forecast into each of the \(5000\) trees and check in which leaf (final node) the new observation will fall into. Averaging over the outcome of all trees gives the final probability of the event happening (“white”) or not happening (“no”).

Prediction: Will there be a white Christmas in Innsbruck?

To finally answer the question, a prediction was made for Christmas (December 24, 2022, 0600 UTC) based on the latest NWP forecast (initialized December 15, 2022, 0000 UTC). The random forest returns a probability of \(8.8\%\). Given the information we have today, nine days before Christmas, the chances of having a closed snow cover with a snow depth of \(\ge 1~cm\) is not zero, but not large either.

Does the expert meteorologist agree with the algorithm? Independently from the machine-learning approach, Georg J. Mayr was going trough a vast amount of weather forecast maps on December 15, 2022. He came to the conclusion that in the days leading up to the event the notorious “Christmas thaw” weather will erase the remaining white and gave the probability of a white Christmas of less than \(5\%\); close to the result from the random forest.

To know for sure whether December 24 will be a white Christmas and how good the forecasts are, we will have to wait for nine days (and watch the latest forecasts). To be able to give such a clear answer with our application is an exciting development. If you are interested in learning more about some of the topics covered but in much greater detail, please have a look at the suggested references below.

References

  • Schlosser L, Hothorn T, Stauffer R, Zeileis A (2019). Distributional regression forests for probabilistic precipitation forecasting in complex terrain. Annals of Applied Statistics, 13(3), 1564-1589. doi:10.1214/19-AOAS1247.
  • Stauffer R, Mayr GJ, Messner JW, Zeileis A (2018). Hourly Probabilistic Snow Forecasts over Complex Terrain: A Hybrid Ensemble Postprocessing Approach. Advances in Statistical Climatoloy, Meteorology and Oceanography, 4(1/2), 65-86. doi:10.5194/ascmo-4-65-2018.
  • Stauffer R, Umlauf N, Messner JW, Mayr GJ, Zeileis A (2017b). Ensemble Postprocessing of Daily Precipitation Sums over Complex Terrain Using Censored High-Resolution Standardized Anomalies. Monthly Weather Review, 145(3), 955-969. doi:10.1175/MWR-D-16-0260.1.

 

DOI: https://www.doi.org/10.48763/000004

 

Portrait photo of Reto Stauffer

Written by Reto Stauffer in December 2022

Assistant Professor at DiSC & Department of Statistics

University of Innsbruck

About the author

I am working on the development and refinement of statistical machine-learning methods and their application in the field of natural and atmospheric sciences: spatio-temporal regression methods to enhance climatological estimates or probabilistic decision trees to improve weather forecasts. The development of these new methods is accompanied by implementing open-source software which is able to handle vast and continuously growing amounts of data in today’s digital world.

Research area

Data Science in Atmosphere Climate Research

    Nach oben scrollen