## Part of the story

The Institute of Actuaries' sessional meeting on 28th September 2009 discussed an interesting paper.  It covered similar material to that in Richards (2008), but used different methods and different data.  Nevertheless, some important results were confirmed: geodemographic type codes are important predictors of mortality, and a combination of geodemographic profile and pension size is better than either factor on its own.  The authors also added an important new insight, namely that last-known salary was a much better predictor than pension size.

The models in the paper were GLMs for qx, which require complete years of exposure.  The authors were rightly concerned that just using complete years would lose important data:

"Dropping all partial exposure cases would be an unnecessary reduction of information. However, including those records with partial exposure in the binomial GLM, without adjusting for the part year nature, could lead to a systematic under-estimation of the qx. We therefore weight the contribution of each of the membership records according to its exposure to risk in a year"

Madrigal et al (2009), paragraph 4.3.1

Unfortunately, if this means what I think it does, then the authors' approach to incomplete years causes the very problem they are trying to avoid.  They are taking observations which by definition have artificially low observed mortality by virtue of being only observed for part of a year.  They are then including them in a model with a smaller weight than proper observations for a complete year (or potential year).  All this achieves is reducing the influence of the observations with incomplete years, it does not allow for the incompleteness of the years themselves.  The effect would be to produce lower mortality rates than should be the case, i.e. the resulting estimates would be biased estimates of the true underlying mortality rates.  The extent of any bias would be directly linked to the relative proportions of part-year and whole-year observations.

To illustrate this, consider the extreme situation where the only data you have is of partial years of exposure.  Imagine an annualised mortality rate of 0.2 amongst a group of 100 identical individuals.  In a complete year we would therefore expect 20 deaths.  If we only had half a year's exposure, then there would be on average only 10 deaths: 20 * 0.5 = 10, assuming a uniform distribution of deaths (UDD) throughout the year.  However, if records are weighted according to exposure, the estimated annual mortality rate would be 10 * 0.5 / 100 * 0.5 = 0.1.  In their admirable aim of trying to include fractional years of exposure, the authors may have inadvertently created a material source of bias in their model.

There are a number of ways to properly allow for fractional exposure in qx models, including UDD, the Balducci assumption and others.  However, the cleanest approach is simply to use survival models.  These seamlessly allow for fractional years of exposure because they model the time to an event, not the number of events occurring.