Spotting quality issues with limited data

In an earlier posting I showed how to use the Kaplan-Meier function to identify subtle data problems.  However, what can you do when you don't have the detailed information to build a full survival curve?  In a recent consulting engagement we were only provided with crude aggregate mortality rates for five-year age bands. This is a nuisance, because such summarisation loses important details in the data.

We had a strong suspicion that the data were of poor quality, and that the problem once again lay with the male-female mortality differential.  We therefore calculated the survival rates for males and females in five-year intervals for the portfolio in question and compared the survival differential with some other benchmarks.  The results are shown in Table 1.

Table 1. Difference in male-female survival rates from age 60 for various mortality tables and portfolios (female survival rate – male survival rate).  Source: Own calculations using lives-weighted mortality.

  1 2 3 4
Survival from 60 to age SAPS table S2PL  Interim life tables 2009–2011 Bulk-
annuity portfolio A
Bulk-
annuity portfolio B
70 4.3%  4.2%  3.0% 4.8%
75 7.9%  7.5%  5.2% 5.8%
80 11.9%  11.0%  8.4% 4.4%
85 14.5%  13.3%  11.5% 8.9%


The different groups have widely differing levels of mortality, from the population mortality of the interim life tables to the private pensioners.  The calculations also apply to slightly different periods of time.  Nevertheless, there is a degree of consistency in the differential survival rates between males and females for columns 1–3.  We can see that over the twenty-year range from age 60 to 80 there should be a differential of between 8% and 12%.  This makes the differential of 4.4% for the last column look rather odd.  Furthermore, the differential widens steadily with age for columns 1–3, whereas it doesn't for column 4.  This makes us rather suspicious about the data for portfolio B, and it raises questions about any mortality basis derived from it.

As with the previous posting on this topic, this data problem could not be detected with an A/E comparison against a standard table.  However, here the standard table has actually proved useful indirectly: by (i) calculating the survival rates under the standard table and (ii) comparing the excess female survival rate to that of the portfolio in question, we can see that there is something wrong with the data for bulk-annuity portfolio B.

Previous posts

Effective dimension

Actuaries often need to smooth mortality rates. Gompertz (1825) smoothed mortality rates by age and his famous law was a landmark in this area. Figure 1 shows the Gompertz model fitted to CMI assured lives data for ages 20–90 in the year 2002. The Gompertz Law usually breaks down below about age 40 and a more general smooth curve would be appropriate. However, a more general smooth curve would obviously require more parameters than the two for the simple Gompertz model.

Tags: Filter information matrix by tag: effective dimension, Filter information matrix by tag: splines, Filter information matrix by tag: P-splines

Boundless confidence?

We've talked repeatedly about a key advantage of statistical models over deterministic ones — specifically, that they provide confidence intervals in addition to a best estimate.
Tags: Filter information matrix by tag: mortality, Filter information matrix by tag: longevity, Filter information matrix by tag: data quality

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.