Extracting the (data) value

Risk management is about properly understanding your risk factors and managing accordingly. In a modern actuarial context, this is about a lot more than a simple comparison against a standard table. The best business practice is to fit a statistical model, and for mortality and other work we recommend using survival models.

The first step in building a mortality model is to extract your mortality data from a payment or administration system. This is typically a relatively straightforward extract query run against your administration database. The first group of data fields are the mandatory ones required as an absolute minimum:

Date of birth (mandatory). This should ideally be provided in the CCYY-MM-DD format to avoid confusion between European- and US-style date formats.
Gender (mandatory). Must be M or F.
Commencement date (mandatory). The commencement date of the pension or annuity, which is either the retirement date (in the case of a first life) or the date of death of the first life (in the case of a surviving spouse's benefit). Again, this should ideally be provided in the CCYY-MM-DD format.
Benefit amount (mandatory). For pensions work, this should be the annualised pension, i.e. twelve times the monthly pension if paid monthly.
End date (mandatory). For pensions still in payment, this is the date of extract in CCYY-MM-DD format. For deaths, this is the date of death. For temporary pensions, such as to children or other dependents, this is the date the pension ceased.
Status (mandatory). If the beneficiary was alive at the end date, this is 0. If the end date is the date of death, this is 1.

An important question is what to do when some of this critical data is not available. One example is the original commencement date, which is sometimes unavailable as a result of system migrations. In such cases the commencement date can sometimes be substituted.

The second group of data fields is technically optional, as a mortality model can be built without them. However, the availability of these data fields will substantially enhance the quality of the modelling work, and in some cases their absence can even invalidate the model.

Surname (optional). The surname of the beneficiary is useful for deduplication, which is essential for work with annuity portfolios.
Forename (optional). The forename of the beneficiary is also useful for deduplication, and can be an indicator of problems with the gender code (for example where records have been reassigned to the gender of the surviving spouse). Sometimes the forename field can contain titles or honorifics.
Postcode (optional). A hierarchical postcode (UK), zip code (USA) or postal code (Canada and Netherlands) for use in geodemographic profiling. Note that when we write "postcode" in the UK, we mean the full postcode. Partial postcodes are markedly less useful, and can even be misleading.

The third group comprises other data fields whose availability is entirely dependent on what has been recorded in your administration system. Some of these data fields can be very useful in mortality modelling. As a general rule, a data field is more likely to be useful if there are a small number of values it can take. The ideal is a binary variable, allowing a simple contrast between two sub-groups, but any compact categorical variable can be investigated as significant risk factor for mortality. Examples of possible variables include:

Smoker status (optional). Seldom available for pensioner work, but very commonly available for assurance risks.
First life or surviving spouse (optional). Some pension-administration systems know whether the benefit is being paid to the first life or a surviving spouse.
Retirement status (optional). Some pension schemes keep markers for ill-health, early and normal retirements.
Marital status (optional). Some pension schemes keep details of whether a pensioner is married or not.

Once you've gathered all the data, all that remains is for it to be be validated, and then the modelling work can begin.

Written by: Gavin Ritchie

Publication Date: 01 August 2012

Last Updated: 01 August 2012

Data handling in Longevitas

Longevitas validates, cleans and optionally augments data on upload using both standard and bespoke rules. The optional deduplication phase also contains validation, checking for conflicts in the status of duplicate records. Configuration controls allow data validation and deduplication to be adjusted and rerun often as necessary.

The built-in data handling features are designed to help you arrive at a dataset suitable for modelling in the shortest possible time.

View all posts

Extracting the (data) value

Data handling in Longevitas

Add new comment

Restricted HTML