Out for the count

In an earlier post we described a problem when fitting GLMs for qx over multiple years.  The key mistake is to divide up the period over which the individual was observed in a model for individual mortality.  This violates the independence assumption and leads to parameter bias (amongst other undesirable consequences). If someone has three records aged 60, 61 and 62 initially, then these are not independent trials: the mere existence of the record at age 62 tells you that there was no death at age 60 or 61.

Life-company data often comes as a series of in-force extracts, together with a list of movements.  The usual procedure is to re-assemble the data to create a single record for each policy, using the policy number common to each in-force record or movement.  The resulting policy-orientated data can then be deduplicated and model proceeds from there.

However, we recently came across a reinsurance data set with an interesting problem.  Each policy contributed one record for each calendar year, and was structured for a traditional actuarial qx-style analysis.  Unfortunately, the records did not have policy numbers or policyholder names, so it was impossible to re-assemble a single policy record.  This made it very tricky to build a valid GLM for qx at the level of the individual.

Interestingly, while a multi-year model for qx at the individual level is invalid under such circumstances, an equivalent model for μx is still perfectly possible (leaving aside the problem of people holding duplicate policies, which will obviously require care in interpreting any outputs)  You may be wondering why the same data is invalid for a GLM for qx, yet valid for a model for μx.  Does the failure of independence between the three records at ages 60, 61 and 62 not affect both models?  The answer is no, and the reason lies in what is actually being modelled.  The individual-level GLM is actually modelling a Bernoulli count of deaths, whereas the survival model is measuring survival time.  In the split-record example above, the trio of death counts for a single individual cannot possibly be (1, 0, 1) as the individual would obviously not be alive after the first year to die again in the third year.  However, a model for qx fed with these split records will implicitly assume that this is a possibility, resulting in false standard errors (amongst other problems).

What about the survival model?  Well, a survival model does not model counts like the qx model, but the survival time.  The basic data consists of a survival time and a death indicator taking the value 0 for survival or 1 for death.  Whether we have a single survival time or chop it up into three pieces makes no difference to the log-likelihood, and thus no difference to the parameter estimates or standard errors.  The total survival time and the death indicator aren't independent of each other, but a survival model assumes they are dependent anyway.

So, data structured for a traditional actuarial qx-style analysis can actually preclude a multi-year statistical model for qx, and yet it can still be valid for a model for μx.  You have to love the irony.




Find by key-word


Epidemics and pandemics are, by definition, fast-moving and difficult to ... Read more
Ever since the unhappy arrival of the SARS-COV-2 virus, COVID-19 ... Read more
The former UK prime minister Harold Wilson famously said that ... Read more
Stephen Richards
Stephen Richards is the Managing Director of Longevitas
Model types in Longevitas
Longevitas users can choose between seventeen types of survival model (μx) and seven types of GLM (qx). In addition there are a further seven extensions of the GLM models for qx to span multi-year data without violation of the independence assumption. Longevitas also offers non-parametric analysis, including Kaplan-Meier survival curves and traditional A/E comparisons against standard tables.