What's in a name?

We have already mentioned the problem of duplication in pension schemes and annuities, and as an issue we encounter frequently it is worth talking a little about some technology that can be used to counter the problem.

What we find in practice is that the unique member identifiers used within financial administration systems are all too frequently, well, not unique. We know that converting policy or benefit orientated data into individual person orientated data is vital statistically, but how can this be done reliably?

The answer is to use a combination of other data attributes present for each member to create a deduplication key around which multiple records can be merged. One common case would be to merge records which shared a common birthdate, name, gender and postcode.

To do this there are a few issues with names that need to be addressed:

  • Names often appear with and without embedded titles - from the relatively mundane Mr, Mrs and Ms., to those ennobled between policy purchases (it happens; not in my life, but it happens). Titles need to be recognised and extracted prior to merging
  • Forenames are commonly abbreviated and suffer from variant spelling - Stephen, Steven, Steve - and may even be truncated to the first initial. We eliminate these challenges by working exclusively with the first initial, although weaker deduplication schemes can leave out the forename altogether.
  • Surnames or family names have far greater variant spelling potential than forenames. Business transacted by tele-servicing in particular is less likely to trap and correct these variants upfront, so they often affect policy records. As an example, my surname can commonly appear as as any of Ritche, Richie or Richey, along with a host of less common variants. But as the family name is an important part of the deduplication key we need to harness it. So we use double metaphone.

Double metaphone is an algorithm developed by Lawrence Philips. It looks through variant spellings by reducing surames to phonetic codes. The "double" in the title stems from the fact that returning up to two codes for a single surname allows the algorithm to deal with common-case Anglo-Saxon and foreign-pronunciation variants simultaneously.

As an example, say we have three annuity records in a portfolio

Date of Birth
Surname
Forename
Postcode
Gender
Surname
Metaphone
25/09/1948 Smith GEH4 2DA M SM0 / XMT
25/09/1948 Smythe Sir GavinEH4 2DA M SM0 / XMT
25/09/1948 Schmidt GavinEH4 2DA M XMT / SMT

Although these surnames differ and would fail a straightforward text match, double metaphone shows all to match on some combination of the primary or alternate phonetic codings. Primary to primary matches - as in Smith and Smythe - are the strongest, but even our alternate to primary match with Schmidt indicates a likely duplicate in the presence of the other corroborating attributes.

Of course, European names also bring accented characters, and if you are a German speaker you might be relieved to note that Strasser and Straßer both share a primary metaphone code of STRS. Well, it pleases us anyway - duplication is a problem you don't want in your models, whatever the language!

 

 

Comments

captcha

Find by key-word


RECENT POSTS

Favourite stories can, in the process of retelling, turn into ... Read more
For centuries, the life table has been at the centre ... Read more
Last week I presented at Longevity 14 in Amsterdam.  A ... Read more
Gavin Ritchie
Gavin Ritchie is the IT Director of Longevitas
Metaphone in Longevitas
Longevitas users can choose whether or not to apply metaphone algorithms during deduplication. Simply go to the Configuration section and open the Deduplication tab. There you will also have the option to select deduplication schemes with or without metaphone encoding.