Residual concerns

One of the most important means of checking a model's fit is to look at the residuals, i.e. the standardised differences between the actual data observed and what the model predicts.  One common definition, known as the Pearson residual, is as follows:

Definition of Pearson residual

where r is the residual, D is the observed number of deaths and E is the expected number of deaths. This definition is quick and easy to apply, and works well where there are relatively large numbers of observed and expected deaths.  If the underlying model used to generate the expected values in E is correct, the residuals should have an approximate N(0, 1) distribution.  The sum of the r2 values can be compared with the appropriate point of a χ2 (chi-squared) distribution to test for fit.

The Pearson definition above depends on the law of large numbers, so it works less well where the number of deaths in each category is relatively small.  One solution is to collapse data across categories to get the number of deaths large enough so that the approximation holds.  However, this restricts your ability to look at localised areas of the model fit.

Small or medium-sized pension schemes usually do have small numbers of deaths, and we would prefer not to have to collapse across groups if it could be avoided.  Fortunately, there is a much better definition of a residual, known as the deviance residual.  Below is the definition of the deviance residual for a Poisson variable:

Definition of deviance residual for a Poisson variable

and the following definition applies for a Binomial variable with a sample size of n:

Definition of deviance residual for Binomial variable

where sign(D-E) takes the value 1 when D>E, -1 when D<E and zero when D=E.  There is little difference between Pearson residuals and deviance residuals where the number of deaths is large, but the deviance residual has better theoretical properties when the number is small.  There is a small amount of extra programming for deviance residuals, but it is worth it to avoid the limitations of Pearson residuals for small data sets.

And in case you are wondering, the value of D * log (D/E) is zero when D=0.




Find by key-word


In a previous blog I outlined my six-month rule of ... Read more
Our battle with COVID-19 has raised many questions around the ... Read more
In his recent blog Stephen described some empirical evidence in ... Read more
Stephen Richards
Stephen Richards is the Managing Director of Longevitas
Residuals in Longevitas

Longevitas calculates and displays deviance residuals for all model fits in the Charts tab.  Residuals can be plotted against many variables, including age and time.  Statistical tests are automatically applied to residuals, and the raw underlying data can be downloaded for further analysis if desired.