## Getting to the root of time-series forecasting

When using a stochastic model for mortality forecasting, people can either use penalty functions or time-series methods . Each approach has its pros and cons, but time-series methods are the commonest. I demonstrated in an earlier posting how an ARIMA time-series model can be a better representation of a mortality index than a random walk with drift. In this posting we will examine the structure of an ARIMA model and how one might go around selecting and fitting it.

Assume we have an index at time $$t$$, $$\kappa_t$$, and an error term, $$\epsilon_t$$ ($$\kappa_t$$ could be the mortality index in the Lee-Carter model, for example). For mortality applications the simplest non-trivial forecasting model is the random walk with drift:

$\kappa_t=\kappa_{t-1}+\mu+\epsilon_t\qquad(1)$

where $$\mu$$ is the drift constant (which usually has a negative value in modern populations, representing a general trend for falling mortality). Another way to write the drift model in equation (1) is as follows:

$\kappa_t-\kappa_{t-1} = \mu+\epsilon_t\qquad(2)$

or

$(1-L)\kappa_t = \mu+\epsilon_t\qquad(3)$

where $$L$$ is the lag operator and is defined as $$L^i\kappa_t=\kappa_{t-i}$$. An ARIMA($$p, d, q$$) model is a generalisation of equation (3):

$\left(1-\sum_{i=1}^p\phi_iL^i\right)(1-L)^d\kappa_t = \mu + \left(1+\sum_{i=1}^q\theta_iL^i\right)\epsilon_t\qquad(4)$

or just:

$\phi_p(L)(1-L)^d\kappa_t = \theta_q(L)\epsilon_t\qquad(5)$

where $$p$$ is the number of autoregressive terms, $$q$$ is the number of moving-average terms and $$d$$ is the differencing order. If $$d=1$$ the ARIMA process models mortality improvements, whereas if $$d=2$$ the ARIMA process models the rate of change in mortality improvements. $$\phi_p(L)=0$$ is the characteristic equation of the autoregressive part of the ARIMA process, while $$\theta_q(L)=0$$ is the characteristic equation of the moving-average part.

Technical analysis of ARIMA models centres on the roots of the characteristic equations. These roots aren't the same thing as the coefficients $$\phi_i$$ and $$\theta_i$$, since the coefficients are always real-valued numbers whereas the roots of the characteristic equations can be either real or complex. A particular concern is that there should be no unit roots in the autoregressive part of the fitted model, i.e. all the roots of $$\phi_p(L)=0$$ are strictly greater than one in magnitude, or else the ARMA process $$(1-L)^d\kappa_t$$ will not be stationary.

Another potential issue is roots which cancel on both sides of equation (5). For example, building on equation (5) we can create an ARIMA($$p+1,d,q+1$$) model as follows:

$(1-\gamma L)\phi_p(L)(1-L)^d\kappa_t = (1-\gamma L)\theta_q(L)\epsilon_t\qquad(6)$

which is obviously identical to equation (5) for any value of $$\gamma$$. Since $$\gamma$$ is a superfluous parameter in equation (6) — it won't change the fit — this particular ARIMA($$p+1,d,q+1$$) model is over-parameterised. How do we guard against this in practice? After all, equation (6) should produce exactly the same fitted values for $$\hat\kappa_t$$. One answer is to use an information criterion, which balances goodness of fit against model complexity ("model complexity" is usually quantified as some function of the number of parameters). The over-parameterised model in equation (6) is therefore penalised in favour of its more-parsimonious relative in equation (5).

The selection process for an ARIMA model for $$\kappa_t$$ can therefore be automated to a large extent by targeting an information criterion. However, analysts also need to be wary of near-cancelling roots; these would be like in equation (6), but where there were two slightly different values of $$\gamma$$ on the left- and right-hand sides, $$\gamma_l$$ and $$\gamma_r$$ say. The values might be just different enough to allow the information criterion to be very slightly lower than in equation (5). Checks for this cannot be easily automated, however, so the analyst will have to manually check that there are not simpler ARIMA models within a few information-criterion units from the minimum.

The best-fitting ARIMA model for a time-series projection can be selected automatically by targeting an information criterion like the AIC, AICc or BIC.  The user can control the search space for ($$p,d,q$$), and plots are provided for each combination along with the information criterion being targeted.