The enduring need for deduplication
In Macdonald et al (2018, Section 2.5) we describe the importance of deduplication, i.e. the identification of individuals behind multiple policies. This is a critical step for a statistical model, as lives can be regarded as independent, whereas the mortality experience of two or more policies written on the same life clearly are not.
Deduplication has a long history in actuarial investigations. Downes (1862) described the creation of a card-index system for an experience investigation. A card was created for each policy, containing the name and date of birth of the policyholder, together with the exposure period. The cards were then sorted and individuals with multiple policies identified; a single substitute card was then created that amalgamated the exposure periods for the various policies. Thus, a series of overlapping policy exposures in Figure 1 would become the single life exposure period in Figure 2.
Figure 1. Three policies written on the same life. Source: Downes (1862, page 14).

Figure 2. Single substitute exposure record created from the three policies in Figure 1. Source: Downes (1862, page 14).

Downes (1862) employed a combination key to identify duplicates, i.e. where adjacent cards had the following matching data items:
Surname
Forename
Date of birth
Using this approach, Downes (1862, page 19) identified 9,335 lives covered by 11,945 life-insurance policies, i.e. an average of 1.28 policies per person. By way of comparison, we found an average of 1.24 policies per life in the annuity portfolio in Richards and Currie (2009, page 320). That's two different classes of business separated by 150 years, and yet the same phenomenon is common to both. As long as insurers and pension schemes have policy- and benefit-orientated administration records, actuaries will need to deduplicate for their experience investigations.
References:
Downes, J. J. (1862) An Account of the Processes employed in getting out the Mortality Experience of the Economic Life Assurance Society.
Macdonald, A. S., Richards. S. J. and Currie, I. D. (2018). Modelling Mortality with Actuarial Applications. Cambridge University Press, Cambridge, ISBN 9781107051386, doi:10.1017/9781107051386.
Richards, S. J. and Currie, I. D. (2009) Longevity risk and annuity pricing with the Lee-Carter model, British Actuarial Journal, Volume 15, No. 2, pages 317-365, doi:10.1017/S1357321700005675.
Thanks to David Raymont, Librarian at the Institute and Faculty of Actuaries, for sourcing a copy of Downes (1862).
Deduplication in Longevitas
Longevitas offers ten different combination keys for deduplication, including the optional use of postcodes and/or phonetic algorithms for variant name spellings.
Previous posts
Deterministics Anonymous
In Macdonald & Richards (2025), Stephen and I pointed out some benefits of models built up from instantaneous Bernoulli trials by product-integration (both of which have featured in previous blogs).
Johannes Karup
As discussed in earlier blogs, trailblazing actuaries Benjamin Gompertz and William Makeham used parametric models for the mortality hazard. However, the data they worked with were typically grouped into wide age ranges, which involves a loss of information if mortality rates are continually increasing.
Add new comment