## The Alias Problem

A problem that can crop up during mortality modelling is that of aliasing, specifically extrinsic aliasing.  The situation can be illustrated by an example of the sort of data available for a pension scheme.  Members accruing benefits have a job classification (say), which can be used to model mortality differentials.  Let's assume the classification is M (manual) and O (office), and that this coding is available for all member pensioners.  Of course, pension schemes also contain benefits paid to surviving spouses, but unless they also worked for the same employer they won't have a job classification.  We therefore have a job-classification factor with three levels: M, O and S (for spouses).

Another common risk factor is health status at retirement: many schemes offer discretionary early ill-health retirement with undiscounted pensions, and member pensions are often coded with an indicator for this, e.g. I (for ill-health) and N (for normal retirement).  As before, surviving spouses won't have a health status at retirement, so we have a health factor with three levels: I, N and S (for spouses again).

It is easy to understand why a mortality model should consider both job classification and retirement health: blue-collar manual labourers tend to have higher mortality rates than white-collar office workers, and people retiring in ill health have higher mortality rates than those retiring in normal health.  In this example, however, the coding of each factor creates a problem for a statistical model: the S group in the job classification specifies exactly the same lives as the S group in the retirement-health factor.  These two levels in two separate factors are extrinsically aliased by the structure of the data.  Fitting a statistical model with these two factors is tricky: the modelling software will be able to work out the effect of job classifications M and O separately from the effect of retirement health statuses I and N as long as there are at least some people with each of the four possible combinations.  However, the software will not be able to work out separate effects for "job classification" S or "retirement health" S, since these two codes cover the same lives.  This is illustrated in Table 1.

Table 1. Cross-classification of job type and retirement-health factors.  $$\times$$ denotes risk-factor combinations present amongst the individual lives.

I Ill-Health retirees N Normal retirees S Spouses (health unknown) $$\times$$ $$\times$$ $$\times$$ $$\times$$ $$\times$$

Table 1 shows where the data gaps are, and why it is not possible to fit the job-type and retirement-health factors as they are currently constructed.  One option would of course be to exclude the spouses from the model.  However, this would be a terrible waste of data: the spouses cannot contribute to the estimation of mortality by job type and retirement health, but they can contribute to the estimation of factors like age, gender and time trend.  Besides, spouses' pensions are a scheme liability and a mortality basis will be needed to value this liability.  Another alternative would be to not fit the job-type and retirement-health factors.  However, it is equally unappealing to not use available information which might be highly relevant to mortality.

My preferred solution in cases of extrinsic aliasing is to create a single combined factor with all the possible combinations.  In the example shown in Table 1 we would define a single factor with five levels: MI, MN, OI, ON and S.  This allows us to use all the data records, spouses included, and it also allows us to consider all the available risk factors.