Enhancement

(Jun 1, 2013)

An oft-overlooked aspect of statistical models is that parameters are dependent on each other.  Ignoring such dependencies can have important consequences, and in extreme cases can even undermine assumptions for a forecasting model.  However, in the case of a regression model the correlations between regressor variables can sometimes have some unexpectedly positive results.  To illustrate this, consider a sequence of fits of a survival model for a Makeham-Perks mortality law (Richards, 2008) defined as follows:

μx = [exp(ε) + exp(α + βx)] / [1 + exp(α + βx)]

where the parameter α is allowed to vary by gender, health status at retirement, or both.  The results…

Factors

(May 5, 2009)

In statistical terminology, a factor is a categorisation which contains two or more mutually exclusive values called levels.  These levels may have a natural order, in which case the variable is said to be an ordinal factor.  An example might be year of birth: 1931 must lie between 1930 and 1932.  Another example would be benefit size band: the 9th decile of sums assured must lie between the 8th and 10th deciles.

In contrast to ordinal factors, a categorical factor is a variable where the levels do not have an obvious order.  An example of a categorical factor would be gender: all you can say about females is that they are categorically different from males, but whether you list males before females or vice versa is…

Degrees of freedom

(Sep 16, 2008)

In an earlier post questioning whether we still need standard tables, we used the AIC to choose between models.  For the AIC we assign a "cost" to the inclusion of extra parameters, and I had counted each mortality rate actually used in the model-fitting as a parameter.  However, there is no set approach for this, and there are a number of arguable positions which could be taken:

1. Count each mortality rate in the table as a parameter.  This would arise because the selection of the table is supposed to be done before fitting any model.  However, it seems harsh to count rates which are not actually used.
2. Count each mortality rate actually used as a parameter.  This is my preferred approach, but the rates themselves…

Choosing between models

(Aug 13, 2008)

In any model-fitting exercise you will be faced with choices. What shape of mortality curve to use? Which risk factors to include? How many size bands for benefit amount? In each case there is a balance to be struck between improving the model fit and making the model more complicated.

Our preferred method of measuring model fit is the log-likelihood function, but this on its own does not take account of model complexity. For example it is usually possible to make a model fit better - i.e. increase the log-likelihood value - by adding extra parameters and risk factors. But is this extra complexity justified? Are those extra parameters and risk factors earning their keep in the model?

There are a number of different…

Tags: AIC, log-likelihood, model fit

Choosing between models - a business view

(Aug 13, 2008)

We discussed how we use the AIC to choose between models. The standard definition of the AIC is:

AIC = -2 * log-likelihood + 2 * number of parameters

However, this is a statistician's view of a model, where the only criterion for including a parameter is whether it is statistically significant. A business view might be different, as each extra parameter in a system will cost you money. IT systems have to be specified, programmed, tested and maintained, for example, and IT staff are not cheap. Each extra parameter might therefore cost you £5,000 in development costs (say), so you might be inclined to only include parameters if they are really significant. One way of doing this is to increase the penalty for the number…

Tags: AIC, log-likelihood, model fit