Forecasting with penalty functions - Part III

This is the last of my three blogs on forecasting with penalties. I discussed the 1-d case in the first blog and the 2-d case in the second. Here we discuss some of the properties of 2-d forecasting. Some readers may find some of my remarks surprising, even paradoxical.

In our first blog we used the Lee-Carter model as an example where a time series is used to forecast mortality. The method is (a) estimate the parameters in the model by fitting the model to suitable data and (b) forecast a subset of the parameters with a suitable time series. The fit to data, by definition, does not depend on the forecast horizon. This is a familiar and attractive property; we will refer to this as the invariance property. It is easy to overlook the fact that a price has been paid for this invariance: we use distinct processes in the data and forecast region, ie, there is a discontinuity between the model used in the data region and the method used to forecast.

Forecasting with penalties is a seamless method, by which we mean fitting and forecasting are done simultaneously; the model both fits and forecasts. We have avoided the discontinuity referred to in the previous paragraph, but as a consequence there is no reason a priori that the invariance property should hold. Nevertheless, in 1-d we saw that the invariance property does hold. However, it does NOT hold in 2-d, and the purpose of this blog is to explain (a) why it cannot hold and (b) why we should be quite relaxed about this.

Let us suppose for the moment that we are wedded to the idea of invariance. We saw in my blog on 2-d modelling that we should think of our regression coefficients as arranged on a grid. The blog on 1-d modelling and forecasting then tells us that we will achieve invariance if we perform 1-d forecasting on each row of coefficients; this will produce linear forecasts at each age. It is straightforward to write down the penalty matrix that achieves this, and Figure 1 shows the results; we call this penalty the invariant penalty. We have used Australian male mortality data for ages 50 to 90, and years 1960 to 2010; the forecast is to 2060. The result is not acceptable. The crossing of the forecasts for ages 50 and 60 occurs in 2057. It is easy to see why this has occurred. There is a flattening of the age 50 mortality from 2000 to 2010; the 2-d model, as a local model, is sensitive to this flattening and picks it up. The forecast at age 50 reflects this local behaviour.

Figure 1: Observed, fitted and forecast mortality for Australian males with the invariant penalty.

The invariant penalty has many attractive properties.

  • The fitted values in the data region are invariant with respect to the forecast horizon.
  • Forecast values to 2030, say, are invariant with respect to any forecast horizon greater than 2030.
  • The effective dimension, the deviance and the smoothing parameters are invariant with respect to the forecast horizon and equal to the values found when fitting to data only.

The problem with Figure 1 is that there is nothing in the invariant penalty to control the forecasts across ages. We need a penalty which (a) forecasts and (b) preserves the structure across ages; this is precisely what the 2-d penalty achieves (Currie, Durban and Eilers, 2004). The forecasts at ages 50 and 60 will be kept apart and a balance is struck between forecasting in time and preserving the structure across ages.

None of the "attractive properties" listed above holds. Some readers may find this disconcerting. Most, maybe even all, other methods of forecasting fit to data and then use a time series to forecast: invariance becomes the conventional wisdom. However, with 2-d methods a single surface is fitted over the entire data and forecast region. The penalty sets up a tension across the whole surface with the result that the forecast affects the fit in the data region; it is a kind of feedback mechanism familiar throughout the natural world. Figure 2 shows the fit and forecast with the 2-d penalty. Everything has turned out well.

Figure 2: Observed, fitted and forecast mortality for Australian males with the original 2-d penalty.

Table 1 compares the forecasts for 10, 25 and 50 years. Although we no longer have exact invariance we have what we might call statistical invariance, by which we mean that all differences are not statistically significant, in the sense that all BIC values are very similar (recall that two models whose BIC values are less than four units aparts are not significantly different). The final column is probably of most interest to actuaries. Let \(\hat {\boldsymbol{\Lambda}}\) denote the table of fitted log mortalities when there is no forecasting and let \(\hat {\boldsymbol{\Lambda}}_F\) denote the corresponding table when forecasting is to year \(F,\, F = 2020, 2035, 2060\). The final column of Table 1 gives values of \(\max(\mbox{abs}((\hat {\boldsymbol{\Lambda}}_F - \hat {\boldsymbol{\Lambda}})/\hat {\boldsymbol{\Lambda}})) \times 100\); these differences are very small indeed.

Table 1: Smoothing parameters, \(\lambda_a\) and \(\lambda_y\), effective dimension, ED, Bayesian Information Criterion, BIC, and %-differences for various forecast horizons.

horizon, \(F\)
\(\lambda_a\) \(\lambda_y\) ED DevBIC % difference
2010 94.9 546.1 55.2 4272.4 4694.6 \(-\)
2020 91.5 558.5 54.9 4274.2 4693.6 0.07
2035 86.1 577.5 54.8 4275.5 4694.2 0.16
2060 87.3 570.4 54.8 4275.5 4694.4 0.18

To sum up: 2-d forecasting may only be statistically invariant but this is a small price to pay for a single minimum tension surface across the whole of the data and forecast region. As W. S. Gilbert remarked in The Pirates of Penzance: it is a most ingenious paradox!

Acknowledgement: I am very grateful to Maria Durban, Carlos III University, Madrid, and Dae-Jin Lee, Basque Center for Applied Mathematics, Bilbao, for drawing my attention to their work on the invariant penalty.


Currie, Durban & Eilers (2004). Smoothing and forecasting mortality rates, Statistical Modelling, 4, 279298.

Human Mortality Database. University of California, Berkeley, USA. Available at (data downloaded 2015).




Find by key-word


The upcoming EU General Data Protection Regulation places focus on ... Read more
Assume we have a random variable, \(X\), with expected value ... Read more
Our new book, Modelling Mortality with Actuarial Applications , describes ... Read more
Iain Currie
Iain Currie is an Honorary Research Fellow in the School of Mathematical and Computer Sciences at Heriot-Watt University
P-splines in the Projections Toolkit
All P-spline models in the Projections Toolkit have separate user-controlled values for the spline spacing by age (dx) and by year (dy), as well as control over the order of the penalty function. Several models allow the user to choose between time-series projections and P-spline projections, while there are also options for positioning the splines by age and/or calendar time.