## See You Later, Indicator

A recurring feature in my previous blogs, such as this one on information, is the indicator process:

$Y^*(x)=\begin{cases}1\quad\mbox{ if a person is alive at age $$x^-$$}\\0\quad\mbox{ otherwise}\end{cases}$

where $$x^-$$ means immediately before age $$x$$ (never mind the asterisk for now).  When something keeps cropping up in any branch of mathematics or statistics, there are usually good reasons, and this is no exception.  Here are some:

• It is a stochastic process, as a function of age $$x$$. $$Y^*(x)$$ tracks whether or not the life is alive in continuous time.  $$Y^*(x)$$ has a familiar expected value: $${\rm E}[Y^*(x+t)|Y^*(x)=1]={}_tp_x$$, the survival probability.
• It leads to useful expected values, such as:

${\rm E}\left[ \int_0^{\infty} Y^*(x+t) \, dt \, \Big| \, Y^*(x) = 1 \right] = \int_0^{\infty} {}_tp_x \, dt, \qquad \mbox{(1)}$

the complete expectation of life.

• If $$\mu(x)$$ is the usual hazard rate, then the product $$Y^*(x) \, \mu(x)$$ is another stochastic process, called the Aalen multiplicative model.  $$Y^*(x)$$ turns the deterministic function $$\mu(x)$$ into a function that is switched on or off, depending on the status of the life being observed.  Also:

$\begin{eqnarray}{\rm E}\left[ \int_0^n Y^*(x+t) \, \mu(x+t) \, dt \, \Big| \, Y^*(x) = 1 \right] &=& \int_0^n {}_tp_x \, \mu(x+t) \, dt\\ &=& {}_tq_x, \qquad \mbox{(2)}\end{eqnarray}$

the probability that a person now age $$x$$ will die before age $$x+t$$ (see Dickson et al. (2013)).  So, from equations (1) and (2), $$Y^*(x)$$ is a way to represent the randomness underlying a human lifetime, which, upon taking expectations, leads to the life table and all the actuary's familiar tools.

• A trivial-looking change to the definition, as follows:

$Y(x)=\begin{cases}1\quad{\rm if\ a\ person\ is\ alive\ {\it and\ under\ observation\ } at\ age\ }x^-\\0\quad\mbox{ otherwise}\end{cases}$

accommodates left-truncated and right-censored observations.  Since precisely these features (especially the latter) distinguish survival models from ordinary statistics, this is quite important.  $$Y(x)$$ is a stochastic representation of left-truncated and right-censored observations, just as a random variable $$X$$ might be a stochastic representation of an experiment, like tossing a coin.

• Integrated, $$Y(x+t)$$ gives the person-years exposed to risk of the life being observed, subject to left-truncation and right-censoring:

$\mbox{Exposure time} = \int_{0}^{\infty} Y(x+t) \, dt \qquad \mbox{(3)}$

(note the limits of integration).  Thus $$Y(x)$$ is closely linked with a basic quantity used in a survival model.  Incidentally, this should remind us that the exposed-to-risk familiar to actuaries is a random variable, not a deterministic quantity.

• The process $$Y^*(x)$$ generalizes equally trivially to a multiple-state model, in which a person might move between several states, for example representing states of health.  If there are $$M$$ states labelled $$1, 2, \ldots, M$$ define:

$Y^j(x)=\begin{cases}1\quad\mbox{ if a person is in state $$j$$ and under observation at age $$x^-$$} \\0\quad\mbox{ otherwise}\end{cases}$

for $$j \in \{ 1, 2, \ldots, M\}$$.  Everything we can do in the simple alive-dead model, including expressions equivalent to equations (1) to (3) above, we can do also in the multiple-state setting, and the $$Y^j(x)$$ are the reason why.

As an example of the usefulness of the indicator process, I will refer to one of my favourite results in our new book, Modelling Mortality with Actuarial Applications (it is in Section 5.7).  If we model the number of deaths at age $$x$$, denoted by $$d_x$$, given $$E^c_x$$ person-years of exposure to risk, as a Poisson random variable with parameter $$E^c_x \, \mu(x+1/2)$$ (as actuaries often do) we get a likelihood of the form:

$\ell^*(\mu) = - \sum_x E^c_x \, \mu(x+1/2) + \sum_x d_x \, \log(\mu(x+1/2)) \qquad \mbox{(4)}$

where the sums are over all ages $$x$$.  If, however, we model each individual life by proposing that they are subject to the hazard rate $$Y(x) \, \mu(x)$$, we get a likelihood of the form:

$\ell(\mu) = - \sum_i \int_0^{t_i} \mu(x_i+t) \, dt + \sum_i d_i \, \log(\mu(x_i+t_i)) \qquad \mbox{(5)}$

where the sums are over all individual persons, labelled $$i$$, and the entry age $$x_i$$, the exit age $$x_i+t_i$$ and the number of deaths $$d_i$$ (equal to zero or one) are particular to the $$i$$th individual. (These appear as equations (5.25) and (5.26) in our book.)  The second approach is the basic idea of a survival model.  In fact we can show that these two likelihoods are, fundamentally, the same, and the key to the short and elegant proof in our book is the process $$Y(x)$$ defined above. The proof also shows that equation (5) is exact and (4) is an approximation to it.

This is truly remarkable. The assumption underlying the survival model, and the likelihood in equation (5), is as bare-bones as can be: at age $$x$$ the hazard rate is $$Y(x) \, \mu(x)$$.  No collecting together of all lives of the same age, no random variable, no assumption of any distribution.  Yet we get, essentially, a Poisson likelihood.  What this tells us is that the bare-bones assumption underlying the survival model is deeply fundamental.

• As we saw in a previous blog, if the hazard rate is $$\mu(x)$$ we get a Poisson process (from which the Poisson random variables underlying the likelihood in equation (4) ultimately derive).
• If the hazard rate is $$Y(x) \, \mu(x)$$ we get a process that jumps at most once, but is subject to left-truncation and right-censoring; again, a plausible model of real survival data.

And so we end up with things that look Poisson — another recurring feature of these blogs and, like the modified Poisson process itself, a good place to stop.

References

Dickson, D.C.M., Hardy, M.R. and Waters, H.R. (2013). Actuarial Mathematics for Life Contingent Risks (second edition). Cambridge University Press.

Macdonald, A.S., Richards. S.J. and Currie, I.D. (2018). Modelling Mortality with Actuarial Applications. Cambridge University Press.