What's in a (file)name?

The upcoming EU General Data Protection Regulation places focus on the potential for personal data exposures to create a risk to the rights of natural persons. The best way to reduce such risk is to minimise the ability to identify individuals from the data you use in your analysis. Thankfully, not all data used for modelling runs the risk of identifying individuals. Group data, such as that used by Longevitas group count survival models, or the grouped death and exposure formats used within the Projections Toolkit service, are not personal data under the terms of the GDPR. Such data stands no risk of identifying individuals. However, individual data used within mortalityrating.com, and within Longevitas individual level survival models may, depending on content, be classified as personal data.

There are various technical measures adopted within software to minimise the risk individuals can be identified. Such mechanisms, including encryption, multi-factor authentication, and pseudonymisation are all valuable. A more fundamental technique to guard against personal data risk is to remove unnecessary data elements to reduce (preferably to zero!) the number of ways and individual might be traced from the data shared and processed. This might be thought of as a variation on the popular security concept of Need to Know. If a calculation, such as a rating or a survival model, doesn't require a piece of knowledge, then our goal should be to remove that knowledge from the process. How can you avoid combining postcodes and dates of birth? How can you avoid combining names and sensitive codes? Questions such as these were the focus of our previous blog on the latest release of mortalityrating.com, and the Transform on Download feature available since February 2016.

However we should not forget aspects that are seemingly more mundane. What knowledge is encoded in uploaded file names and file descriptions? Clearly if we use publicly recognisable references for pension schemes or annuity portfolios, that piece of context may, when combined with other fields in the dataset, make it easier to identify individuals. Identifying the dataset member who is oldest, youngest or has the highest or lowest pension may be made easier by knowing the source of their annuity, and is certainly made easier with knowledge of the organisation paying their defined-benefit pension. For this reason our latest GDPR updates focus on such details in two ways:

  1. On file upload the system will propose a random, neutral description that can be retained or overtyped.
  2. The system will discard all knowledge of the original file name and rely only upon the user-supplied description.

These changes are already in place for the latest releases of mortalityrating.com, Longevitas and the Projections Toolkit. Contact us if you need further information.




Find by key-word


Epidemics and pandemics are, by definition, fast-moving and difficult to ... Read more
Ever since the unhappy arrival of the SARS-COV-2 virus, COVID-19 ... Read more
The former UK prime minister Harold Wilson famously said that ... Read more
Gavin Ritchie
Gavin Ritchie is the IT Director of Longevitas

Not all Longevitas services work at the individual level or process potentially personal data. However, the software incorporates a number of features and techniques to minimise the need for personal data even within individual modelling and rating calculations. Key features like "Transform on Download" and "Postcode Proxies" can anonymise postcodes, names and dates of birth. This retains the benefits of modelling individual lifetimes, but without uploading records that can identify individuals. And of course, our services operate strong authentication and encryption of uploaded data along with a variety of other technical measures.