Deduplication and annuities
Deduplication is an important step in data preparation for mortality modelling (or any other kind of modelling for that matter). If people in your data set have multiple benefit records, then the crucial independence assumption for statistical modelling in invalidated. An effective algorithm for identifying duplicates is described in a paper presented to the Institute of Actuaries.
The problem of duplicates is a major issue for annuity portfolios, where it is very common for people to have multiple policies. On average I expect around 1.2 annuities per person, although this is obviously portfolio-specific. I also find that the average number of annuities per person tends to increase with age. This might partly be product-driven, as older annuitants are more likely to be self-employed holders of old-style s226 policies, whereas at younger ages there are more CPAs from personal pensions. However, the increase in average policies per person with age also appears to be correlated with their socio-economic status and wealth. This makes deduplication even more important, as the duplicate problem is linked to one of the important variables you are trying to analyse.
I did once hear a claim from a many-tentacled consultancy that the independence assumption wasn't important for large portfolios. This is obviously untrue, since an average of 1.2 annuities per person ruins the independence assumption equally for large and small data sets. Indeed, large portfolios tend to exhibit ever-greater extremes. In one very large annuity portfolio I analysed in 2005, I found a man with no fewer than 32 seperate annuities.