Visualising data-quality in time

(Nov 30, 2020)

In a recent blog I defined the Nelson-Aalen estimate with respect to calendar time, rather than with respect to age as is usual.  I showed how a simple difference of this estimate could be used to reveal seasonal patterns in mortality, and also how it could identify shocks like covid-19.  However, this time-based non-parametric estimator also turns out to be handy for detecting data-quality issues.

To recap, the Nelson-Aalen estimate of the integrated hazard from time \(y\) to \(y+t\) is denoted \(\hat\Lambda_{y,t}\); it is defined as follows:

\[\hat\Lambda_{y,t} = \sum_{t_t\le t}\frac{d_{y+t_i}}{l_{y+t_i^-}}\qquad (1)\]

for a set \(\{y+t_i\}\) of distinct times (dates) of death with \(d_{y+t_i}\)…

Read more

Tags: data validation, missing data, Nelson-Aalen

Spotting quality issues with limited data

(Mar 10, 2014)

In an earlier posting I showed how to use the Kaplan-Meier function to identify subtle data problems.  However, what can you do when you don't have the detailed information to build a full survival curve?  In a recent consulting engagement we were only provided with crude aggregate mortality rates for five-year age bands. This is a nuisance, because such summarisation loses important details in the data.

We had a strong suspicion that the data were of poor quality, and that the problem once again lay with the male-female mortality differential.  We therefore calculated the survival rates for males and females in five-year intervals for the portfolio in question and compared the survival differential with…

Read more

Tags: data validation, survival rates, standard table

Spotting hidden data-quality issues

(Nov 3, 2013)

The growing market for longevity risk-transfer means that takers of the risk are keenly interested in the mortality characteristics of the portfolio concerned. The first thing requested by the risk-taker is therefore detailed data on the portfolio's recent mortality experience.  This is ideally data extracted on a policy-by-policy basis. Once received, the careful analyst checks that the data are sound.  Failure to spot data problems at the start will at best waste time, and at worst lead to concluding a deal on bad terms.  There is therefore tremendous value in simple checks of data quality.

We saw in an earlier post how survival models can reveal data problems.  However, these issues can sometimes be spotted…

Read more

Tags: data validation, Kaplan-Meier

Special Assignment

(Sep 14, 2011)

We talked previously about the use of user-defined validation rules to clean up specific data artefacts you sometimes find in portfolio data. One question came up recently about modelling bespoke benefit bands, and this can also benefit from user-defined rules.

In our modelling system we automatically calculate a user-selected number of benefit bands, each containing a broadly equal number of lives. The model optimiser can be used to cluster these bands, giving you the best-fitting break points for your experience data. A drawback is that the optimised break-points might not correspond to any pre-established business convention. So, what do you do if you want a constant banding for use with all files?


Read more

Tags: technology, data validation, deduplication

Business benefits of statistical models

(Mar 25, 2011)

In a recent meeting I was asked by a reinsurer what the advantages were of using statistical models in his business. The reinsurer knew about the greater analytical power of survival models, but he wanted more. One reason I gave is that because survival models are built at the level of the individual, it is often easier to spot data problems which would otherwise be invisible to an actuary using traditional methods.

As it happened, a good example of this cropped up the following week. A pension scheme was looking at the feasibility of a longevity swap. During the quotation process an updated extract of the pensioner data was provided, ostensibly to give an up-to-date picture of the annual pension amounts. A cursory…

Read more

Tags: data validation, residual, survival models

Rewriting the rulebook

(Dec 2, 2010)

It is an unfortunate fact of life that through time every portfolio will acquire data artefacts that make risk analysis trickier. Policyholder duplication is one example of this and archival of claims breaking the time-series is another. Data errors introduced by servicing are perhaps the most commonplace of all, and this posting describes how validation rules can protect the modelling stage from such errors.

The first class of issue is the generic data corruption, termed generic because these problems occur with the same characteristics in more or less every portfolio you work with. Generic validation rules are critical here, screening out such problems before modelling commences. These issues include…

Read more

Tags: technology, data validation

Find by key-word

Find by date

Find by tag (show all )