## Spotting quality issues with limited data

In an earlier posting I showed how to use the Kaplan-Meier function to identify subtle data problems.  However, what can you do when you don't have the detailed information to build a full survival curve?  In a recent consulting engagement we were only provided with crude aggregate mortality rates for five-year age bands. This is a nuisance, because such summarisation loses important details in the data.

We had a strong suspicion that the data were of poor quality, and that the problem once again lay with the male-female mortality differential.  We therefore calculated the survival rates for males and females in five-year intervals for the portfolio in question and compared the survival differential with some other benchmarks.  The results are shown in Table 1.

Table 1. Difference in male-female survival rates from age 60 for various mortality tables and portfolios (female survival rate – male survival rate).  Source: Own calculations using lives-weighted mortality.

1234
Survival from 60 to ageSAPS table S2PL Interim life tables 2009–2011Bulk-
annuity portfolio A
Bulk-
annuity portfolio B
70 4.3%  4.2%  3.0% 4.8%
75 7.9%  7.5%  5.2% 5.8%
80 11.9%  11.0%  8.4% 4.4%
85 14.5%  13.3%  11.5% 8.9%

The different groups have widely differing levels of mortality, from the population mortality of the interim life tables to the private pensioners.  The calculations also apply to slightly different periods of time.  Nevertheless, there is a degree of consistency in the differential survival rates between males and females for columns 1–3.  We can see that over the twenty-year range from age 60 to 80 there should be a differential of between 8% and 12%.  This makes the differential of 4.4% for the last column look rather odd.  Furthermore, the differential widens steadily with age for columns 1–3, whereas it doesn't for column 4.  This makes us rather suspicious about the data for portfolio B, and it raises questions about any mortality basis derived from it.

As with the previous posting on this topic, this data problem could not be detected with an A/E comparison against a standard table.  However, here the standard table has actually proved useful indirectly: by (i) calculating the survival rates under the standard table and (ii) comparing the excess female survival rate to that of the portfolio in question, we can see that there is something wrong with the data for bulk-annuity portfolio B.

Assume we have a random variable, $$X$$, with expected value ... Read more