## Quantiles and percentiles

QuantilesĀare points taken at regular intervals from theĀcumulative distribution functionĀof a random variable. They are generally described as q-quantiles, where q specifies the number of intervals which are separated by q−1 points. For example, the 2-quantile is the median, i.e. the point where values of a distribution are equally likely to be above or below this point.

A percentile is the name given to a 100-quantile. ĀInĀSolvency IIĀwork we most commonly look for the 99.5thĀpercentile, i.e. the point at which the probability that a random event exceeds this value is 0.5%. ĀThe simplest approach to estimating the 99.5thĀpercentile might be to simulate 1,000 times and take the 995th or 996thĀlargest value. ĀHowever, there are several alternative ways of estimating a quantile or percentile, as documented by Hyndman and Fan (1996). One of the commonest approaches is the definition used by MicrosoftĀExcel, and which is also option type 7 in theĀRĀfunction quantile(). ĀIn general, we seek a percentile level p which lies in the interval (0, 1). Ā If x[i] denotes the ith largest value in a data set, then the percentile sought by Excel isĀx[(n − 1)p + 1].

To illustrate the calculation of sample quantiles, consider the following R commands to generate a simulated loss distribution of 1,000 values from the N(0,1) distribution:

# Generate some pseudo-random N(0,1) variates
set.seed(1)
temp = rnorm(1000)
sort(temp)[994:1000]

Ā

When these commands are run you should see the seven largest values as follows:

2.401618, 2.446531, 2.497662, 2.649167, 2.675741, 3.055742, 3.810277

In this example, n=1000 and p=0.995, so (n − 1)p + 1 = 995.005. Ā This latter value is not an integer, so we must interpolate between the 995th and 996th largest values. The final answer is then:

0.995 ū 2.446531 + 0.005 ū 2.497662 = 2.447

So we now have our estimate of the 99.5th percentile. What is often overlooked is that the sample percentile is an estimate, i.e. there is uncertainty over what the true underlying value is for the 99.5th percentile. ĀIn fact, the percentile is part of a branch of probability theory calledĀorder statistics, and it turns out that the sample percentile above is not the most efficient estimator. ĀThere are many other estimators, of which one is due to Harrell and Davis (1982). ĀOne reason the Harrell-Davis estimator is more efficient is because it uses all of the data, rather than the order statistics. ĀIn the example above, the Harrell-Davis estimate of the 99.5th percentile can be found with the following extra R commands:

# Calculate the 99.5th percentile and a standard error for it
library(Hmisc)
hdquantile(temp, 0.995, se=TRUE)

Ā

This yields an estimate of the 99.5th percentile of 2.534, and we can see that the Harrell-Davis estimator is more efficient because this is closer to the known percentile of the N(0,1) distribution (2.576). ĀPerhaps even more useful is the fact that the Harrell-Davis estimator comes with a standard error, which here is 0.136.Ā Table 1 shows how many simulations are required to get within a given level of closeness to the true underlying 99.5th percentile.

Table 1. Harrell-Davis estimates and standard errors of 99.5th percentile of n N(0,1) variates. Source: Own calculations.

n99.5th
percentile
Standard
error
Coefficient of
variation
1,000 2.534 0.136 5.4%
10,000 2.517 0.047 1.9%
25,000 Ā2.564 0.027 1.1%
50,000 2.577 0.020 0.8%
100,000 2.564 0.014 0.5%

References

Harrell, F. E. and Davis, C. E. (1982) A new distribution-free quantile estimator. Biometrika, 69, 635¢640.Ā

Hyndman, R. J. and Y. Fan, Y (1996) Sample quantiles in statistical packages. American StatisticianĀ(American Statistical Association), 50 (4):361¢365. ### RECENT POSTS

The importance of seasonal analysis was underscored by a recent ... Read more
In my previous blog I showed how suddenly the excess ... Read more
According to British Prime Minister Harold Wilson , " a ... Read more  Stephen Richards is the Managing Director of Longevitas
##### Quantiles in our software suite

Longevitas, the Projections Toolkit and mortalityrating.com calculate their quantiles and percentiles using the same definition as used by Microsoft Excel, which in turn is the same as option type 7 in the R function quantile().

In addition, Longevitas and the Projections Toolkit also provide quantile reports which use the Harrell-Davis estimates of the 99.5th percentile.