Quantiles and percentiles

Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. They are generally described as q-quantiles, where q specifies the number of intervals which are separated by q−1 points. For example, the 2-quantile is the median, i.e. the point where values of a distribution are equally likely to be above or below this point.

A percentile is the name given to a 100-quantile.  In Solvency II work we most commonly look for the 99.5th percentile, i.e. the point at which the probability that a random event exceeds this value is 0.5%.  The simplest approach to estimating the 99.5th percentile might be to simulate 1,000 times and take the 995th or 996th largest value.  However, there are several alternative ways of estimating a quantile or percentile, as documented by Hyndman and Fan (1996). One of the commonest approaches is the definition used by Microsoft Excel, and which is also option type 7 in the R function quantile().  In general, we seek a percentile level p which lies in the interval (0, 1).   If x[i] denotes the ith largest value in a data set, then the percentile sought by Excel is x[(n − 1)p + 1].

To illustrate the calculation of sample quantiles, consider the following R commands to generate a simulated loss distribution of 1,000 values from the N(0,1) distribution:

# Generate some pseudo-random N(0,1) variates
set.seed(1)
temp = rnorm(1000)
sort(temp)[994:1000]

 

When these commands are run you should see the seven largest values as follows:

2.401618, 2.446531, 2.497662, 2.649167, 2.675741, 3.055742, 3.810277

In this example, n=1000 and p=0.995, so (n − 1)p + 1 = 995.005.   This latter value is not an integer, so we must interpolate between the 995th and 996th largest values. The final answer is then:

0.995 × 2.446531 + 0.005 × 2.497662 = 2.447

So we now have our estimate of the 99.5th percentile. What is often overlooked is that the sample percentile is an estimate, i.e. there is uncertainty over what the true underlying value is for the 99.5th percentile.  In fact, the percentile is part of a branch of probability theory called order statistics, and it turns out that the sample percentile above is not the most efficient estimator.  There are many other estimators, of which one is due to Harrell and Davis (1982).  One reason the Harrell-Davis estimator is more efficient is because it uses all of the data, rather than the order statistics.  In the example above, the Harrell-Davis estimate of the 99.5th percentile can be found with the following extra R commands:

# Calculate the 99.5th percentile and a standard error for it
library(Hmisc)
hdquantile(temp, 0.995, se=TRUE)

 

This yields an estimate of the 99.5th percentile of 2.534, and we can see that the Harrell-Davis estimator is more efficient because this is closer to the known percentile of the N(0,1) distribution (2.576).  Perhaps even more useful is the fact that the Harrell-Davis estimator comes with a standard error, which here is 0.136.  Table 1 shows how many simulations are required to get within a given level of closeness to the true underlying 99.5th percentile.

Table 1. Harrell-Davis estimates and standard errors of 99.5th percentile of n N(0,1) variates. Source: Own calculations.

n99.5th
percentile
Standard
error
Coefficient of
variation
1,000 2.534 0.136 5.4%
10,000 2.517 0.047 1.9%
25,000  2.564 0.027 1.1%
50,000 2.577 0.020 0.8%
100,000 2.564 0.014 0.5%

References

Harrell, F. E. and Davis, C. E. (1982) A new distribution-free quantile estimator. Biometrika, 69, 635–640. 

Hyndman, R. J. and Y. Fan, Y (1996) Sample quantiles in statistical packages. American Statistician (American Statistical Association), 50 (4):361–365.

 

Comments

captcha

Find by key-word


RECENT POSTS

The upcoming EU General Data Protection Regulation places focus on ... Read more
Assume we have a random variable, \(X\), with expected value ... Read more
Our new book, Modelling Mortality with Actuarial Applications , describes ... Read more
Stephen Richards
Stephen Richards is the Managing Director of Longevitas
Quantiles in our software suite

Longevitas, the Projections Toolkit and mortalityrating.com calculate their quantiles and percentiles using the same definition as used by Microsoft Excel, which in turn is the same as option type 7 in the R function quantile().

In addition, Longevitas and the Projections Toolkit also provide quantile reports which use the Harrell-Davis estimates of the 99.5th percentile.