## Dealing with missing data

In an earlier post we looked at how to create a proxy for ill-health early retirements based on age at commencement.  This is an example of dealing with missing data — we infer a useful proxy to replace the lost or missing health status at retirement.

Another common problem occurs during data or system migrations, where historical experience data is often not carried across to a new administration system.  Migrations happen when a life office consolidates multiple systems into one, or when a pension scheme changes administrator.  System migrations aren't easy, and migrating past historical data is usually one of the last tasks on the priority list.  As a result, data migration is unfortunately one of the first tasks to be dropped when time gets tight.  This has resulted in many systems containing only partial mortality data.

Such situations naturally affect mortality investigations.  In particular, exposure calculations cannot include the pre-migration period if deaths data have not been migrated as well.  Failure to do this would mean under-estimating mortality rates, as exposure periods would be included without the corresponding deaths. Migrations are not always done as a single action, so it might not be as simple as counting exposure from a single date for all policies.  At worst, the data might have been migrated in stages, raising the problem of deduplicating across records which were migrated at different times.

One solution is to look at the payment records linked to the policy, and to use the earliest date of activity as an indicator of the migration date.  The earliest evidence of activity on the new administration system is likely to be very close to the date when the policy was actually migrated. For pensions, one would look at the earliest payment date.  For term-assurance policies, one would look at the premium-collection records and use the earliest premium collected on the new system.  For deferred pensions there is unfortunately neither payment nor premium collection, which is one reason why the data for such business is seldom usable for mortality investigations.

Actuaries often find themselves in the position of working with less-than-perfect data.  However, intelligent use of other data items can compensate for missing information.

Assume we have a random variable, $$X$$, with expected value ... Read more