Less is More: when weakness is a strength

A mathematical model that obtains extensive and useful results from the fewest and weakest assumptions possible is a compelling example of the art. A survival model is a case in point. The only material assumption we make is the existence of a hazard rate, \(\mu_{x+t}\), a function of age \(x+t\) such that the probability of death in a short time \(dt\) after age \(x+t\), denoted by \({}_{dt}q_{x+t}\), is:

\[{}_{dt}q_{x+t} = \mu_{x+t}dt + o(dt)\qquad (1)\]

(see Stephen's earlier blog on this topic). It would be hard to think of a weaker mathematical description of mortality as an age-related process. But from it much follows:

  • If we observe a life age \(x_i\) for a time \(t_i\), and define \(d_i = 1\) if the life died at age \(x_i+t_i\), and \(d_i = 0\) otherwise (i.e. if our observation was right-censored at age \(x_i+t_i\)) the probability of this observation can be written compactly as:

\[L_i = \exp\left(-\int_0^{t_i}\mu_{x_i+s}ds\right)\mu^{d_i}_{x_i+t_i}.\qquad (2)\]

See Chapter 5 of our new book, Modelling Mortality with Actuarial Applications, where this expression is derived. Notably, it requires neither an assumption about any particular formula for \(\mu_{x+t}\), nor any statistical model for the number of deaths.

  • Want to fit a model to a portfolio with data for each individual life? Then (2) gives you the contribution of the \(i^{\rm th}\) life to the likelihood.
  • Have a particular parametric model in mind (Gompertz, Makeham, Perks, G-M family, etc.)? No problem, just plug your assumptions into (2) and reach for your favourite maximisation routine. This also requires no statistical model for the number of deaths.
  • Want to estimate the hazard rate at single years of age? (useful for checking model fit, or fitting a generalised linear model, or projecting mortality). Then aggregate the observations made between ages \(x\) and \(x+1\), resulting in \(d_x\) observed deaths and \(E^c_x\) person-years of time exposed to the risk of death. Then from (2) we can show that the random number of deaths, \(D_x\) (of which \(d_x\) is the observed value) is, to a very good approximation, a Poisson random variable with parameter \(E^c_x\mu_{x+\frac{1}{2}}\). So now we do have a statistical model for the number of deaths but, as shown in an earlier blog, it is not an a priori assumption, but is a property that emerges from (1).
  • Want to incorporate a vector of covariates \(z_i\) for the \(i^{\rm th}\) life? (For example \(z_i\) could describe sex, smoking status or benefit amount.) Either stratify the data, fitting a separate hazard-rate model for each value of the covariates, or model the data, defining a hazard rate, \(\mu_{z,x+t}\), as a function of the covariates as well as age, and fitting a model to all the data (see Chapter 7 of our book). In either case, putting the hazard rates into (2) gives the likelihood.
  • Need to model more than just 'alive' or 'dead'? Assume that movements between any two of a number of states (e.g. healthy, sick, dead) are governed by a hazard rate analogous to (1). This leads to Markov multiple-state models (see Chapters 14–17 of our book).

All in all, a respectable outcome starting from such a weak assumption as (1). But perhaps this is not so surprising when we remember that (1) is also the assumption underlying the Poisson process, which seems to be one of nature's most fundamental models.

References:

Macdonald, A. S., Richards. S. J. and Currie, I. D. (2018). Modelling Mortality with Actuarial Applications, Cambridge University Press, Cambridge.

Written by: Angus Macdonald
Publication Date:
Last Updated:

Model types in Longevitas

Longevitas users can choose between seventeen types of survival model (μx) and seven types of GLM (qx). In addition there are a further seven extensions of the GLM models for qx to span multi-year data without violation of the independence assumption. Longevitas also offers non-parametric analysis, including Kaplan-Meier survival curves and traditional A/E comparisons against standard tables. 

Previous posts

(GDP)Renewing our mail-list

In common with many other organisations, we are celebrating the arrival of the EU General Data Protection Regulation (GDPR) by renewing our mailing list.
Tags: Filter information matrix by tag: GDPR, Filter information matrix by tag: data protection

What's in a (file)name?

The upcoming EU General Data Protection Regulation places focus on the potential for personal data exposures to create a risk to the rights of natural persons. The best way to reduce such risk is to minimise the ability to identify individuals from the data you use in your analysis.
Tags: Filter information matrix by tag: GDPR, Filter information matrix by tag: data protection

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.