Groups v. individuals

We have previously shown how survival models based around the force of mortality, μx, have the ability to use more of your data.  We have also seen that attempting to use fractional years of exposure in a qx model can lead to potential mistakes. However, the Poisson distribution also uses μx, so why don't we use a Poisson model for the grouped count of deaths in each cell?  After all, a model using grouped counts sounds like it might fit faster.  In this article we will show why survival models constructed at the level of the individual are still preferable.

The first step when using the Poisson model is to decide on the width of the age interval.  This is necessary because the Poisson model for grouped counts requires that the force of mortality is constant.  Table 1 shows the parameters for a Gompertz force of mortality using various widths of age interval:

Table 1. Parameters for a Poisson GLM for grouped counts using different widths of age interval. Source: Longevitas Ltd, using mortality-experience data of 593 deaths among 7,363 lives in a UK defined-benefit pension scheme for ages 60–90 over the quadrennium 2006–2009.

5.0 -12.138 0.11676
1.0 -12.439 0.11772
0.2 -12.504 0.11796


For comparison, the same parameters in a full Gompertz survival model using individual data are α=-12.455 and β=0.11793.  We can see that the value for β under the Poisson approximation gets closer with shorter age intervals, but the same is not true for α.  We need to look at the Poisson model in more detail to understand why it is an imperfect approximation for the survival model.

A key assumption in the Poisson model is that the force of mortality is piecewise constant, i.e. constant within each age interval.  Since mortality rates increase exponentially with age, relatively short age intervals are required.  Of course, as we split up the age range into smaller intervals, we also reduce the potential computational savings relative to the survival model.  In fact, in our R test scripts the Poisson model actually takes longer to fit than the survival model because of the extra program time spent preparing the data.

However, this process of splitting the exposures into shorter age intervals also reduces the number of lives in each group.  This brings with it its own problems for the Poisson model — it has a non-zero probability of more deaths in a group than there are people!  This problem is most apparent with small numbers of lives, a situation which will crop up more frequently when the age interval is shrunk.  This is what is happening in Table 1: when the age interval is 0.2, there are too many cells with low actual and expected numbers of deaths, thus causing a violation of the Poisson assumption.  Technically, we would describe the Poisson model as not well specified in relation to the task at hand.

In many practical circumstances the Poisson model can be a reasonable working approximation, especially for very large data sets.  Indeed, the Poisson model can often be a useful independent check of the survival-model fit.  However, survival models at the level of the individual are preferable because they are better specified than the Poisson model for grouped counts and can be fitted faster.




Find by key-word


Epidemics and pandemics are, by definition, fast-moving and difficult to ... Read more
Ever since the unhappy arrival of the SARS-COV-2 virus, COVID-19 ... Read more
The former UK prime minister Harold Wilson famously said that ... Read more
Stephen Richards
Stephen Richards is the Managing Director of Longevitas
Model types in Longevitas
Longevitas users can choose between seventeen types of survival model (μx) and seven types of GLM (qx). In addition there are a further seven extensions of the GLM models for qx to span multi-year data without violation of the independence assumption. Longevitas also offers non-parametric analysis, including Kaplan-Meier survival curves and traditional A/E comparisons against standard tables.