## Up close and intimate with the APCI model

This blog brings together two pieces of work. The first is the paper we presented to the Institute and Faculty of Actuaries, *"A stochastic implementation of the APCI model for mortality projections"*, which will appear in the British Actuarial Journal. The second is a previous blog where I examined the role of constraints in models of mortality. The present blog combines the two and looks at the application of constraints to the CMI's APCI model. I showed the results to a fellow statistician who expressed astonishment and even used the word "unbelievable"'. So buckle up, and prepare to be astonished!

We use data on UK males from the UK's Office for National Statistics. We have the number of deaths, \(d_{x,y}\), age \(x\) last birthday in year \(y\) and mid-year population estimates, the central exposures \(E^c_{x+1/2, y+1/2}\) for ages 50100 and for years 19712015. We assume that \(D_{x,y}\), the random variable corresponding to the observed deaths \(d_{x,y}\), follows the Poisson distribution with mean \(E^c_{x+1/2, y+1/2}\mu_{x+1/2, y+1/2}\), where \(\mu_{x+1/2, y+1/2}\) is the force of mortality at exact age \(x+1/2\) and exact time \(y+1/2\).

The CMI's APCI model is:

\[\log\,\mu_{i+1/2,\,j+1/2} = \alpha_i + \kappa_j + \gamma_{c(i,j)} + \beta_i(\bar y - y_j)\qquad (1).\]

Here we index the ages \(i = 1,\ldots,n_a\), and the years \(j = 1,\ldots, n_y\), where \(n_a = 51\) is the number of ages and \(n_y = 45\) is the number of years. The cohorts are indexed from one to \(n_c = n_a+n_y-1 = 95\) with the oldest cohort in the first year assigned the index one. With this convention \(c(i,j)= n_a-i+j\).

The APCI model is clever, since it combines a linearized version of the Lee-Carter term \(\beta_i \kappa_j \approx \kappa_j + \beta_i(\bar y - y_j)\), with a cohort term \(\gamma_{c(i,j)}\). The model in this form is an example of a generalized linear model. The CMI then goes one step further and smooths \(\boldsymbol{\alpha}\), \(\boldsymbol{\kappa}\), \(\boldsymbol{\gamma}\) and \(\boldsymbol{\beta}\). Here the CMI and I part company. At the risk of teaching grandma to suck eggs, insurance is about the proper assessment of risk; smoothing the \(\boldsymbol{\kappa}\) and \(\boldsymbol{\gamma}\) terms suppresses the stochastic element of the model. Any resulting forecast of mortality will appear more certain than the data justify. Our paper puts the stochastic element back where it belongs. Smoothing \(\boldsymbol{\alpha}\) and \(\boldsymbol{\beta}\) does not affect the stochastic element in (1), but has the desirable property of producing more regular forecasts. We will smooth the \(\boldsymbol{\alpha}\) and \(\boldsymbol{\beta}\) terms. We use the method of \(P\)-splines (Eilers and Marx, 1996) with cubic \(B\)-splines and a second-order penalty on both \(\boldsymbol{\alpha}\) and \(\boldsymbol{\beta}\).

Here is the first surprise. Model (1) requires five constraints to enable unique estimates of the parameters to be found, but we only require four when \(\boldsymbol{\alpha}\) and \(\boldsymbol{\beta}\) are smoothed with a second-order penalty. There is an interaction between the model matrix and the penalty that reduces the effective rank of the model. This was completely unexpected. Note that this is a different point on identifiability constraints from the one made in Richards et al (2019) there we observed that you do not need all five constraints if you don't estimate effects for corner cohorts. In fact both points are related mathematically, and a full explanation behind this behaviour is given in Currie (submitted).

The following four constraints achieve identifiability in the smooth version of (1):

\[\sum \kappa_j = \sum \gamma_c = \sum c \gamma_c = \sum c^2 \gamma_c = 0\qquad (2).\]

We refer to these constraints as the *standard constraints*.

However, the perverse soul from my previous blog is up to his old tricks and instead places four sets of *random constraints* on all the parameters. He argues that the choice of constraints doesn't make any difference to the *estimates and forecasts of mortality*, so a random set does the job just as well as a particular set such as (2). Estimates of the parameters under the standard constraints are denoted by suffix \(s\) as in \(\boldsymbol{\hat\alpha_s}\), and by suffix \(r\) as in \(\boldsymbol{\hat\alpha_r}\), for the random constraints.

So what happens? Figure 1 shows the two sets of estimates. The estimates are spectacularly different *in both shape and scale*. Surely there is something wrong? Interestingly, no Figure 2 shows the observed and fitted \(\log(\mu)\). Despite the differences in the estimated parameters, the fitted values under both sets of constraints are identical.

Figure 1. Estimates of \(\boldsymbol{\alpha}\), \(\boldsymbol{\kappa}\), \(\boldsymbol{\gamma}\) and \(\boldsymbol{\beta}\) under the standard and random constraints.

Figure 2. Observed and estimated values of \(\log(\mu)\) under standard and random constraints for ages 50 (bottom) to 100 (top) in steps of ten years.

But there is more. Forecasting with an ARIMA model with either set of parameters gives identical forecasts. This seems a bit of a stretch, given the differences between the two sets of estimates. The point is that these estimates are closely related and these relationships orchestrate the behaviour of the forecasts. To be precise, the difference between \(\boldsymbol{\hat\beta_s}\) and \(\boldsymbol{\hat\beta_r}\) is a constant (here 0.0415) while the differences in the other sets are all linear; for example, here \(\boldsymbol{\hat\alpha_s} - \boldsymbol{\hat\alpha_r}=-0.1790 \times {\rm age} + 6.822\).

In conclusion, the two sets of estimates do much more than merely hold hands they are up close and intimate, and this intimacy ensures that estimates and forecasts are inevitably identical. There is some very beautiful mathematics controlling this behaviour. There has been some discussion in the literature about the choice of constraints; for example, whether constraints on cohort parameters should be weighted by the number of times a cohort appears in the data. This discussion is now seen to be redundant. You can even follow the perverse soul and use random constraints!

**References**

Currie, I. D. (in preparation). Constraints, the identifiability problem and the forecasting of mortality.

Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with *B*-splines and penalties, *Statistical Science*, **11**, 89121.

R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.r-project.org/.

Richards, S. J., Currie, I. D., Kleinow, T. and Ritchie, G. P. (2019). A stochastic implementation of the APCI model for mortality projections, *British Actuarial Journal* (to appear).

### Comments