Everything points to Poisson

One recurring theme in our forthcoming book, Modelling Mortality with Actuarial Applications, is the all-pervading role of likelihoods that suggest the lurking presence of a Poisson distribution. A popular assumption in modelling hazard rates is that the number of deaths observed at any given age is a Poisson random variable, so perhaps that might explain it? Surprisingly, it is the other way round — it is the very nature of the data in a survival model that leads inexorably to the Poisson distribution, even if we assume no such thing.

Stripped back to basics, we observe \(n\) individuals and record our observations as:

The length of time \(E_i\) that the \(i\)th person was observed and alive; and
An indicator of death, \(d_i\), taking the value 1 if the \(i\)th person died and 0 if they were alive when observation ended.

If we assume the hazard rate \(\mu\) to be constant, we have a simple survival model, and we can write down the likelihood of this observation. It is:

\[\Pr[{\rm Observed\ to\ survive\ for\ time\ }E_i]\qquad (1)\]

if \(d_i=0\), and:

\[\Pr[{\rm Observed\ to\ survive\ for\ time\ }E_i,{\rm then\ to\ die}]\qquad (2)\]

if \(d_i=1\). The probability of surviving stated above is just one of the actuary's life-table \({}_tp_x\)-type probabilities and since:

\[{}_tp_x = \exp \left( - \int_0^t \mu_{x+s} \, ds \right)\]

the constant hazard rate \(\mu\) shows that expression (1) is \(\exp( -E_i \mu )\). If \(d_i=1\) then expression (2) has an additional factor \(\mu\) (really \(\mu.dt\), but the \(dt\) gets dropped from the likelihood). Now for the key step: a neat way to write the likelihood that covers both (1) and (2) is as:

\[ L_i(\mu) \propto \exp( - E_i \mu ) \, \mu^{d_i}. \qquad \mbox{(3)} \]

We are nearly there. Suppose our observations are at some age \(x\), then we add up the totals of deaths and exposure times as follows:

\begin{eqnarray*} E^c_x & = & \sum_{i=1}^n E_i \\ d_x & = & \sum_{i=1}^n d_i \end{eqnarray*}

and the total likelihood (the product of all the individual factors, \(L_i(\mu)\)) is then:

\[ L(\mu) \propto \exp( - E^c_x \mu ) \, \mu^{d_x}. \]

When writing down a likelihood it is usual to drop any factors that don't depend directly on the parameter of interest, which here is \(\mu\). However, here we will do the opposite: we will insert a couple of such multiplicative factors, namely \((E^c_x)^{d_x}\) and \(1/d_x !\). The result of doing so does not change inference for \(\mu\) and it gives the following likelihood:

\[ L^*(\mu) \propto \frac{\exp( - E^c_x\mu ) (E^c_x\mu)^{d_x}}{d_x !}. \]

In other words \(L^*(\mu)\) is \(P[D=d_x]\), as if the total number of deaths, \(D\), were a random variable with a Poisson distribution with parameter \(E^c_x \mu\). All of this is, of course, merely playing with a 'toy' model — real survival data have important features that we have ignored. But what emerges, as we build models that can handle real survival data, is the ubiquity of likelihoods built up from factors that look very much like expression (3) above. And that always tells us that the Poisson distribution can't be far away. Or the Poisson process, but that's another story…which by happy coincidence, you can read in Part Three of our book.

References:

Macdonald, A. S., Richards, S. J. and Currie, I. D. 2018 Modelling Mortality With Actuarial Applications, Cambridge University Press (to appear).

Written by: Angus Macdonald

Publication Date: 16 January 2018

Last Updated: 16 January 2018

Tags: survival data, Poisson distribution

View all posts

Everything points to Poisson

Add new comment

Restricted HTML