## One small step

When fitting mortality models, the foundation of modern statistical inference is the log-likelihood function. The point at which the log-likelihood has its maximum value gives you the maximum-likelihood estimates of your parameters, while the curvature of the log-likelihood tells you about the standard errors of those parameter estimates.

The log-likelihood function is maximised by finding where the gradient is zero. To find this, one requires the first derivatives of the function with respect to each parameter. Similarly, the curvature is measured by the second partial derivatives with respect to each possible pair of parameters. In both cases, one requires either the derivatives themselves, or else a numerical approximation to those derivatives.

Numerical derivatives can be obtained relatively straightforwardly by numerical differentiation. This involves perturbing a parameter by a small value, *h* say, and then expressing the change in function value relative to *h*. In theory, the smaller the value of *h*, the closer your numerical approximation will be to the actual derivative. Such numerical derivatives have the advantage that they avoid the need to work out large numbers of formulaic derivatives for each mortality law. By way of illustration, Table 1 shows the estimates and approximated standard errors for one parameter in a simple Gompertz model using numerical differentiation:

Table 1. Estimated value and approximate standard error for **Intercept** parameter in a simple Age+Gender Gompertz model for mortality. Numerical differentiation is used with varying values for the step size, *h*.

*h* | Intercept | Std. Err |

0.001 | -13.0085 | 0.119561 |

0.0001 | -13.0404 | 0.116770 |

0.00001 | -13.0408 | 0.115219 |

0.000001 | -13.0408 | 0.119749 |

0.0000001 | -13.0408 | 0.008756 |

The estimate of the **Intercept** parameter in Table 1 is consistent for all of the given values of *h*, with a degree of convergence apparently achieved for values of 0.00001 and smaller. However, there is no such obvious convergence for the approximated standard error, with the estimate becoming unstable for very small values of *h*. The cause is likely to be the same sort of arithmetic underflow described in an earlier post.

For true reliability one simply has to do the maths and find the formulaic expressions for the derivatives. This has the advantage that points of possible arithmetic underflow can be identified and programmed around, while it also usually results in much faster fitting times. Using proper derivatives, the values for the parameters in Table 1 are -13.0408 for the Intercept and 0.116768 for the standard error. From this we can deduce that *h*=0.0001 was the best-performing approximation step in Table 1. This will not be the case for all models, however, and so the best way to accurately estimate a standard error is to work out the derivatives. It is admittedly a lot of work to do all the necessary maths, but, as shown in this example, it is the fastest and most reliable way to fit the model parameters.

### Comments