Back to the future with Whittaker smoothing

Many actuaries will be familiar with Whittaker smoothing (1923) but few will be aware of the close connection between this early method and the method of P-splines. The purpose of this blog is to explain this connection.

Figure 1 is a typical plot of log(mortality) of the kind of data we might want to smooth; here we use UK male data for 2011 for ages 2 to 30 taken from the Human Mortality Database.

Figure 1: Crude and Whittaker smoothed log mortality rates for UK males ages 2 to 30 in 2011.

Let $y_i = \log(d_i/e_i), i = 1, \ldots, n$, be our data where the $d_i$ are the observed deaths at age $i$ and the $e_i$ are the corresponding central exposures, and let $\mu_i, i = 1, \ldots, n$, be the candidate smooth values of $y_i$. Whittaker realizes that smoothing data is achieved by striking a balance between fidelity to the data and smoothness of the fitted curve. He defines fidelity and smoothness as follows: fidelity is measured by the sum of squares of deviations: $$\begin{equation} \label{eq:ssq} SS = \sum_1^n (y_i - \mu_i)^2 \end{equation}$$ and smoothness is measured "by the smallness of the sum of the squares of third differences" $$\begin{equation} \label{eq:smooth} \begin{split} M_3 = (\mu_4 - 3\mu_3 + 3\mu_2 - \mu_1)^2 + (\mu_5 - 3\mu_4 + 3\mu_3 - \mu_2)^2 \\ + \ldots + (\mu_n - 3\mu_{n-1} + 3\mu_{n-2} - \mu_{n-3})^2 \end{split} \end{equation}$$ Whittaker must now balance the relative sizes of $SS$ and $M_3$ which he achieves by considering $$\begin{equation} \label{eq:Balance} SS + \lambda M_3 \end{equation}$$ for some value of $\lambda$. For a given value of $\lambda$ our smooth curve is obtained by minimizing $(\ref{eq:Balance})$.

Back in 1923, the minimization of $(\ref{eq:Balance})$ for given $\lambda$ required some pretty daunting arithmetic, certainly by today's standards. Nowadays we can easily write down an expression for an explicit solution and then use the power of a modern computer to evaluate it. We write $(\ref{eq:Balance})$ in matrix/vector form as $$\begin{equation} \label{eq:Matrix} ({\bf y} - {\bf \mu})^\prime ({\bf y} - {\bf \mu}) + \lambda {\bf \mu}^\prime {\bf D}^\prime {\bf D} {\bf \mu} \end{equation}$$ where ${\bf y} = (y_1,\ldots,y_n)^\prime$, ${\bf \mu} = (\mu_1,\ldots,\mu_n)^\prime$ and ${\bf D}$ is the third order difference matrix $$\begin{equation} \label{eq:D_3} {\bf D} = \left[\begin{array}{ccccccc} 1 & -3 & 3 & -1 & 0 & 0 &\cdots \\ 0 & 1 & -3 & 3 & -1 & 0 &\cdots \\ 0 & 0 & 1 & -3 & 3 & -1 & \cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \end{array} \right]. \end{equation}$$ Differentiating $(\ref{eq:Matrix})$ wrt ${\bf \mu}$ and solving, we have the Whittaker smooth for given $\lambda$ is $$\begin{equation} \label{eq:W_soln} \hat {\bf \mu} = ({\bf I} + \lambda {\bf D}^\prime {\bf D})^{-1} {\bf y} \end{equation}$$ where ${\bf I}$ is the identity matrix of size $n$. We would normally choose $\lambda$ by minimizing AIC or BIC; Whittaker suggests that "we try two or three different values and see which gives the most satisfactory result". With 29 ages, equation $(\ref{eq:W_soln})$ requires the inversion of a $29 \times 29$ matrix, far beyond what was possible in 1923. Whittaker found an explicit solution which was further simplified by Henderson (1924); the method is often referred to as Whittaker-Henderson smoothing, although the arithmetic remains challenging. We used trial and error directly with $(\ref{eq:W_soln})$; $\lambda = 10$ gave a "satisfactory result" and the resulting smooth curve is shown in Figure 1.

So what is the connection between Whittaker smoothing and the method of P-splines (Eilers and Marx, 1996)? In a nutshell, Whittaker smoothing is a special case of the method of P-splines. In fact, the fundamental ideas in the Eilers and Marx paper are all in Whittaker's paper. In P-splines equation $(\ref{eq:W_soln})$ is replaced by $$\begin{equation} \label{eq:P_soln} \hat {\bf \mu} = ({\bf B}^\prime {\bf B} + \lambda {\bf D}^\prime {\bf D})^{-1} {\bf B}^\prime {\bf y} \end{equation}$$ where ${\bf B}$ is a regression matrix of B-splines of degree $d$ (Eilers and Marx, 1996 equation (13)). If we use B-splines of degree zero with knot spacing of one year then ${\bf B} = {\bf I}$ and $(\ref{eq:P_soln})$ reduces to $(\ref{eq:W_soln})$!

We mention four advantages of the P-spline method over Whittaker's method.

We have additional flexibility with B-splines.
We can use the P-spline method in a generalized linear model framework, which allows Poisson distributed deaths.
Whittaker smoothing requires knot spacing of one year. With two dimensional data, this leads to challenging numerical difficulties. For example, if we have ages 50 to 90 and years 1950 to 2010 then equation $(\ref{eq:W_soln})$ requires the storage and inversion of a $2501 \times 2501$ matrix. In contrast, with B-spline bases of degree $3$ for age and year each with knot spacing of 5 years, then equation $(\ref{eq:P_soln})$ requires only the storage and inversion of a $165 \times 165$ matrix.
The efficient GLAM algorithms (Currie, Durban and Eilers, 2006) which may be used for P-spline smoothing are not available for 2-dimensional Whittaker smoothing.

So next time you smooth mortality data with P-splines just remember that you are essentially using the ideas that Sir Edmund Whittaker dreamed up nearly a century ago.

Footnote: If you want to try Whittaker smoothing for yourself you can find an R function and data as a downloadable zip file here ; the illustrative data will enable you to reproduce Figure 1. You can find further example smoothing code at our discussion of graduation. Note that R can be downloaded free of charge from the R website. If you need assistance in installing or running R, please see the list of frequently asked questions (FAQs). Neither Longevitas Ltd nor Iain Currie can supply support in installing or running R.

References:

Currie, Durban & Eilers (2006). Generalized linear array models with applications to multidimensional smoothing, Journal of the Royal Statistical Society, Series B, 6, 259–280.

Eilers & Marx (1996). Flexible smoothing with B-splines and penalties, Statistical Science, 11, 89-121.

Henderson (1924) A new method of graduation. Transactions of the Actuarial Society of America, 25, 29-53.

Human Mortality Database. University of California, Berkeley, USA. Available at www.mortality.org (data downloaded 2015).

R Core Team, (2015). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

Whittaker (1923). On a new method of graduation. Proceedings of the Edinburgh Mathematical Society, 41, 63-73.

Written by: Iain Currie

Publication Date: 07 August 2015

Last Updated: 07 August 2015

Tags: Whittaker smoothing, splines, P-splines, penalty function

Whittaker smoothing

The Projections Toolkit uses P-spline smoothing of which Whittaker smoothing can be considered an earlier and more restricted special case.

Users of Longevitas modelling can find a Whittaker smoothing plot in the Traditional Analysis (A/E) Excel download.

View all posts

Comments

Thanks for writing this. I'll be applying some version of this to baseline detection in signals.

Reply to comment #45

Back to the future with Whittaker smoothing

Whittaker smoothing

Comments

Add new comment

Restricted HTML