## Out for the count

In an earlier post we described a problem when fitting GLMs for *q*_{x} over multiple years. The key mistake is to divide up the period over which the individual was observed in a model for individual mortality. This violates the independence assumption and leads to parameter bias (amongst other undesirable consequences). If someone has three records aged 60, 61 and 62 initially, then these are not independent trials: the mere existence of the record at age 62 tells you that there was no death at age 60 or 61.

Life-company data often comes as a series of in-force extracts, together with a list of movements. The usual procedure is to re-assemble the data to create a single record for each policy, using the policy number common to each in-force record or movement. The resulting policy-orientated data can then be deduplicated and model proceeds from there.

However, we recently came across a reinsurance data set with an interesting problem. Each policy contributed one record for each calendar year, and was structured for a traditional actuarial *q*_{x}-style analysis. Unfortunately, the records did not have policy numbers or policyholder names, so it was impossible to re-assemble a single policy record. This made it very tricky to build a valid GLM for *q*_{x} at the level of the individual.

Interestingly, while a multi-year model for *q*_{x} at the individual level is invalid under such circumstances, an equivalent model for *μ*_{x} is still perfectly possible (leaving aside the problem of people holding duplicate policies, which will obviously require care in interpreting any outputs) You may be wondering why the same data is invalid for a GLM for *q*_{x}, yet valid for a model for *μ*_{x}. Does the failure of independence between the three records at ages 60, 61 and 62 not affect both models? The answer is no, and the reason lies in what is actually being modelled. The individual-level GLM is actually modelling a Bernoulli count of deaths, whereas the survival model is measuring survival time. In the split-record example above, the trio of death counts for a single individual cannot possibly be (1, 0, 1) as the individual would obviously not be alive after the first year to die again in the third year. However, a model for *q*_{x} fed with these split records will implicitly assume that this is a possibility, resulting in false standard errors (amongst other problems).

What about the survival model? Well, a survival model does not model counts like the *q*_{x} model, but the *survival time*. The basic data consists of a survival time and a death indicator taking the value 0 for survival or 1 for death. Whether we have a single survival time or chop it up into three pieces makes no difference to the log-likelihood, and thus no difference to the parameter estimates or standard errors. The total survival time and the death indicator aren't independent of each other, but a survival model assumes they are dependent anyway.

So, data structured for a traditional actuarial *q*_{x}-style analysis can actually preclude a multi-year statistical model for *q*_{x}, and yet it can still be valid for a model for *μ*_{x}. You have to love the irony.

### Comments