The Emperor's New Clothes, Part II

In my previous blog I described a real case where so-called artificial intelligence (AI) would have struggled to spot data problems that a (suspicious) human could find.  But what if the input data are clean and reliable?  Can machine learning via a neural network provide a better-fitting mortality model than (say) an ordinary GLM or survival model?

First we need to define what we mean by "better fitting".  To do this we need an objective function, i.e. a quantitative measure of fit or prediction power.  Various objective functions are available, from the likelihood in a probabilistic model to the mean average percentage error (MAPE).  However, there is a catch - almost any model will fit better if we add some more parameters, even useless ones. The improvement in fit might be too small to be meaningful, so we need an objective function that penalises useless complexity.  To justify adding an extra parameter, that parameter needs to improve the fit enough to earn its keep.  We therefore need an objective function that takes into account Occam's Razor, i.e. simpler models are generally better unless you have solid evidence to justify greater complexity.  As I discussed in a very early blog, complexity has a cost that must be paid for.

In statistics one such class of objective function is the information criterion.  This is a combination of the log-likelihood (to measure model fit) plus an offset based on the number of parameters (to penalise complexity).  Various information criteria are in common use, such as Akaike's Information Criterion (AIC) and the Bayesian Information Criterion (BIC).  The choice depends on the application - in Richards (2022) I showed how the BIC was better than the AIC for a mortality model with a large number of parameters was large ("large" here still meant fewer than 50 parameters overall).

What about neural networks?  Even modest neural networks have upwards of 1,000 parameters.  Often, when proponents of such models compare their results against (say) a GLM, they show that their neural network has a better fit or greater predictive power. However, these comparisons seldom make any mention of complexity, so the comparison is stacked against the more parsimonious GLM.  For a comparison to be truly meaningful, an objective function must include not only a measure of fit, but also a penalty against useless complexity.  The next time someone claims that their neural network produces a better mortality model, ask them what information criterion they used.  If complexity is not penalised, the comparison isn't very informative.

References:

Richards, S. J. (2022) Allowing for shocks in portfolio mortality modelsBritish Actuarial Journal, 27:e1, doi:10.1017/S1357321721000180.

Information criteria in Longevitas and the Projections Toolkit

Longevitas produces the AIC and BIC for every fitted model.  The Projections Toolkit uses only the BIC, as projection models tend to have much larger numbers of parameters.

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.