Build versus buy
In an earlier blog I quoted extensively from "The Mythical Man-Month", a book by the distinguished software engineer Fred Brooks. My blog was admittedly self-interested(!) when it cited arguments made by Brooks (and others) for when it makes sense to buy software instead of writing it yourself. However in place of "buying" one could perhaps better write "externally source" — in addition to purchasing (or licensing) purpose-written software, one can also use freely available software. A good example is R, which itself depends upon other third-party libraries of mathematical subroutines, such as BLAS and LAPACK.
The question of what to build oneself, and what to externally source, is related to the economic concept of division of labour and the business concept of core competence. What value does your organisation add? Can components of that value be more quickly or cheaply sourced external to your organisation? The question then is not "could we build this?", but "should we build this?". In our case the value that Longevitas adds is advanced analysis of mortality data. We develop solutions to specific actuarial business problems, and we implement these methods in software so that our clients can quickly apply them. As a software business, we therefore do a lot of coding.
However, no organisation should develop all its software, and our business is no exception. For example, there was no point in writing our own function to fit an ARIMA model for mortality forecasting when such a function exists in R. The reason for this is worth restating:
[Mathematical software] is arcane, requiring an enormous intellectual input per line of code [...] The cost to reconstruct a component of mathematical software is high, and the cost to discover the functionality of an existing component is low.
Van Snyder, quoted in Brooks (1995, p223)
A different situation occurred when we wanted to implement survival models for actuarial use. Besides their power, survival models are tailor-made for the kind of data that actuaries typically have: individual records with detailed covariates for each life. Our first port of call was again R, which has several packages supporting survival models (as one might expect from a language developed by and for statisticians). However, a key difference between actuarial and statistical applications of survival models is left-truncation. Simply put, most statisticians model the time since outset of a treatment or process. Each data point is an observation from time 0 (when the life \(i\) started treatment) to time \(t_i\) (when the experiment ends and an indicator variable \(d_i\) takes the value 1 if life \(i\) died, or zero if life \(i\) was still alive at time \(t_i\)).
In contrast, a holder of a life-assurance product only becomes known to actuaries well into adult life at age \(x_i\). Life \(i\) is then observed for time \(t_i\) with the same indicator variable \(d_i\) describing the life's status at age \(x_i+t_i\). Actuarial data are therefore missing the period from birth (age 0) to the point a contract started or a pensioner started receiving a benefit (age \(x_i\)). Since there was no existing software that fitted left-truncated survival models, we had to write our own. As the Van Snyder quote above suggests, this was a non-trivial endeavour, but it was core to our business.
Survival models are very useful for actuarial mortality analysis, and so have an application for many insurers and consultancies. And of course actuaries typically have the skills to do the necessary mathematics and coding. However, actuaries are also expensive, and they are often in short supply. The management question is therefore where actuarial staff are most profitably employed: writing software, or producing value-added insights for clients using existing software? Each organisation must decide for itself, with time, cost and opportunity all playing a role in the decision.
References:
Brooks, F. P. (1995) The Mythical Man-Month: Essays on Software Engineering, Addison-Wesley, ISBN 0-201-83595-9.
Add new comment