The enduring need for deduplication

In Macdonald et al (2018, Section 2.5) we describe the importance of deduplication, i.e. the identification of individuals behind multiple policies.  This is a critical step for a statistical model, as lives can be regarded as independent, whereas the mortality experience of two or more policies written on the same life clearly are not.

Deduplication has a long history in actuarial investigations.  Downes (1862) described the creation of a card-index system for an experience investigation. A card was created for each policy, containing the name and date of birth of the policyholder, together with the exposure period.  The cards were then sorted and individuals with multiple policies identified; a single substitute card was then created that amalgamated the exposure periods for the various policies.  Thus, a series of overlapping policy exposures in Figure 1 would become the single life exposure period in Figure 2.

Figure 1.  Three policies written on the same life.  Source: Downes (1862, page 14).

Three separate overlapping policy exposures with different start and end points.

Figure 2.  Single substitute exposure record created from the three policies in Figure 1.  Source: Downes (1862, page 14).

Single replacement exposure from earliest starting point to latest end point.

Downes (1862) employed a combination key to identify duplicates, i.e. where adjacent cards had the following matching data items:

  1. Surname

  2. Forename

  3. Date of birth

Using this approach, Downes (1862, page 19) identified 9,335 lives covered by 11,945 life-insurance policies, i.e. an average of 1.28 policies per person.  By way of comparison, we found an average of 1.24 policies per life in the annuity portfolio in Richards and Currie (2009, page 320).  That's two different classes of business separated by 150 years, and yet the same phenomenon is common to both.  As long as insurers and pension schemes have policy- and benefit-orientated administration records, actuaries will need to deduplicate for their experience investigations.

References:

Downes, J. J. (1862) An Account of the Processes employed in getting out the Mortality Experience of the Economic Life Assurance Society.

Macdonald, A. S., Richards. S. J. and Currie, I. D. (2018). Modelling Mortality with Actuarial Applications. Cambridge University Press, Cambridge, ISBN 9781107051386, doi:10.1017/9781107051386.

Richards, S. J. and Currie, I. D. (2009) Longevity risk and annuity pricing with the Lee-Carter model, British Actuarial Journal, Volume 15, No. 2, pages 317-365, doi:10.1017/S1357321700005675.

 

Thanks to David Raymont, Librarian at the Institute and Faculty of Actuaries, for sourcing a copy of Downes (1862).

Written by: Stephen Richards
Publication Date:
Last Updated:

Deduplication in Longevitas

Longevitas offers ten different combination keys for deduplication, including the optional use of postcodes and/or phonetic algorithms for variant name spellings.

Previous posts

Deterministics Anonymous

In Macdonald & Richards (2025), Stephen and I pointed out some benefits of models built up from instantaneous Bernoulli trials by product-integration (both of which have featured in previous blogs).

Tags: Filter information matrix by tag: Poisson distribution, Filter information matrix by tag: survival models

Johannes Karup

As discussed in earlier blogs, trailblazing actuaries Benjamin Gompertz and William Makeham used parametric models for the mortality hazard. However, the data they worked with were typically grouped into wide age ranges, which involves a loss of information if mortality rates are continually increasing.

Tags: Filter information matrix by tag: survival models, Filter information matrix by tag: force of mortality

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.