Choosing between models

In any model-fitting exercise you will be faced with choices. What shape of mortality curve to use? Which risk factors to include? How many size bands for benefit amount? In each case there is a balance to be struck between improving the model fit and making the model more complicated.

Our preferred method of measuring model fit is the log-likelihood function, but this on its own does not take account of model complexity. For example it is usually possible to make a model fit better - i.e. increase the log-likelihood value - by adding extra parameters and risk factors. But is this extra complexity justified? Are those extra parameters and risk factors earning their keep in the model?

There are a number of different test statistics which can be used to strike this balance: the Bayesian Information Criterion (BIC) is one, but our preference is Akaike's Information Criterion (AIC), which was proposed by Akaike in 1987. The definition of the AIC is:

AIC = -2 * log-likelihood + 2 * number of parameters

Straight away we can see that the AIC includes the usual measure of goodness of fit, namely the log-likelihood. However, it also includes the number of parameters, so it can balance improved model fit against complexity. For a given data set, then, the preferred model is the one with the lowest value of the AIC.