Spotting quality issues with limited data

In an earlier posting I showed how to use the Kaplan-Meier function to identify subtle data problems. However, what can you do when you don't have the detailed information to build a full survival curve? In a recent consulting engagement we were only provided with crude aggregate mortality rates for five-year age bands. This is a nuisance, because such summarisation loses important details in the data.

We had a strong suspicion that the data were of poor quality, and that the problem once again lay with the male-female mortality differential. We therefore calculated the survival rates for males and females in five-year intervals for the portfolio in question and compared the survival differential with some other benchmarks. The results are shown in Table 1.

Table 1. Difference in male-female survival rates from age 60 for various mortality tables and portfolios (female survival rate – male survival rate). Source: Own calculations using lives-weighted mortality.

	1	2	3	4
Survival from 60 to age	SAPS table S2PL	Interim life tables 2009–2011	Bulk- annuity portfolio A	Bulk- annuity portfolio B
70	4.3%	4.2%	3.0%	4.8%
75	7.9%	7.5%	5.2%	5.8%
80	11.9%	11.0%	8.4%	4.4%
85	14.5%	13.3%	11.5%	8.9%

The different groups have widely differing levels of mortality, from the population mortality of the interim life tables to the private pensioners. The calculations also apply to slightly different periods of time. Nevertheless, there is a degree of consistency in the differential survival rates between males and females for columns 1–3. We can see that over the twenty-year range from age 60 to 80 there should be a differential of between 8% and 12%. This makes the differential of 4.4% for the last column look rather odd. Furthermore, the differential widens steadily with age for columns 1–3, whereas it doesn't for column 4. This makes us rather suspicious about the data for portfolio B, and it raises questions about any mortality basis derived from it.

As with the previous posting on this topic, this data problem could not be detected with an A/E comparison against a standard table. However, here the standard table has actually proved useful indirectly: by (i) calculating the survival rates under the standard table and (ii) comparing the excess female survival rate to that of the portfolio in question, we can see that there is something wrong with the data for bulk-annuity portfolio B.

Written by: Stephen Richards

Publication Date: 10 March 2014

Last Updated: 10 March 2014

Tags: data validation, survival rates, standard table

Effective dimension

19 February 2014

Actuaries often need to smooth mortality rates. Gompertz (1825) smoothed mortality rates by age and his famous law was a landmark in this area. Figure 1 shows the Gompertz model fitted to CMI assured lives data for ages 20–90 in the year 2002. The Gompertz Law usually breaks down below about age 40 and a more general smooth curve would be appropriate. However, a more general smooth curve would obviously require more parameters than the two for the simple Gompertz model.

Tags: effective dimension, splines, P-splines

Boundless confidence?

19 February 2014

We've talked repeatedly about a key advantage of statistical models over deterministic ones — specifically, that they provide confidence intervals in addition to a best estimate.

Tags: mortality, longevity, data quality

View all posts

Spotting quality issues with limited data

Previous posts

Effective dimension

Boundless confidence?

Add new comment

Restricted HTML