Out for the count

In an earlier post we described a problem when fitting GLMs for q_x over multiple years. The key mistake is to divide up the period over which the individual was observed in a model for individual mortality. This violates the independence assumption and leads to parameter bias (amongst other undesirable consequences). If someone has three records aged 60, 61 and 62 initially, then these are not independent trials: the mere existence of the record at age 62 tells you that there was no death at age 60 or 61.

Life-company data often comes as a series of in-force extracts, together with a list of movements. The usual procedure is to re-assemble the data to create a single record for each policy, using the policy number common to each in-force record or movement. The resulting policy-orientated data can then be deduplicated and model proceeds from there.

However, we recently came across a reinsurance data set with an interesting problem. Each policy contributed one record for each calendar year, and was structured for a traditional actuarial q_x-style analysis. Unfortunately, the records did not have policy numbers or policyholder names, so it was impossible to re-assemble a single policy record. This made it very tricky to build a valid GLM for q_x at the level of the individual.

Interestingly, while a multi-year model for q_x at the individual level is invalid under such circumstances, an equivalent model for μ_x is still perfectly possible (leaving aside the problem of people holding duplicate policies, which will obviously require care in interpreting any outputs) You may be wondering why the same data is invalid for a GLM for q_x, yet valid for a model for μ_x. Does the failure of independence between the three records at ages 60, 61 and 62 not affect both models? The answer is no, and the reason lies in what is actually being modelled. The individual-level GLM is actually modelling a Bernoulli count of deaths, whereas the survival model is measuring survival time. In the split-record example above, the trio of death counts for a single individual cannot possibly be (1, 0, 1) as the individual would obviously not be alive after the first year to die again in the third year. However, a model for q_x fed with these split records will implicitly assume that this is a possibility, resulting in false standard errors (amongst other problems).

What about the survival model? Well, a survival model does not model counts like the q_x model, but the survival time. The basic data consists of a survival time and a death indicator taking the value 0 for survival or 1 for death. Whether we have a single survival time or chop it up into three pieces makes no difference to the log-likelihood, and thus no difference to the parameter estimates or standard errors. The total survival time and the death indicator aren't independent of each other, but a survival model assumes they are dependent anyway.

So, data structured for a traditional actuarial q_x-style analysis can actually preclude a multi-year statistical model for q_x, and yet it can still be valid for a model for μ_x. You have to love the irony.

Written by: Stephen Richards

Publication Date: 31 July 2009

Last Updated: 31 July 2009

Tags: survival models, force of mortality, GLM, missing data

Model types in Longevitas

Longevitas users can choose between seventeen types of survival model (μ_x) and seven types of GLM (q_x). In addition there are a further seven extensions of the GLM models for q_x to span multi-year data without violation of the independence assumption. Longevitas also offers non-parametric analysis, including Kaplan-Meier survival curves and traditional A/E comparisons against standard tables.

View all posts

Out for the count

Model types in Longevitas

Add new comment

Restricted HTML