Blog post

A major part of our research is about producing quantitative estimates in order to help inform our decisions and recommendations. That could include which charities to recommend, what to research in more depth, or which career to choose. This post explains some methodological issues around making these estimates, and how we are addressing them.

We want to justify our estimates, and to make them as accurate as we can. However just about everything has some uncertainty attached to it, and it would be hopelessly counterproductive to ignore all estimates we are not completely confident about. But nor should we forget about our uncertainty, as it impacts how we use those estimates. So we'd really like to keep track of the uncertainty, along with the other parts of the estimate.

This is all well known, and those making estimates are generally aware of it. Typically people keep track of upper and lower bounds, as well as a best guess. For example if estimating the number of piano tuners in Chicago, you might say your best guess is 125 but also give a lower bound of 30 and an upper bound of 500. While this is much better than ignoring uncertainty, we've noticed that there are some problems with this approach.

Firstly, it's not clear what is meant by "bound". Since it's very rare for us to be completely certain something lies above or below a given figure(1), the bounds presumably play a similar role to a confidence interval. But different people may have different definitions in mind. For example, one author could use "lower bound" to mean the level they are 90% certain the true value lies above (the 10th percentile in their subjective estimate), a second the same term to mean the 5th percentile, and a third for the 1st percentile. The numbers produced are not going to be meaningfully comparable. This is tolerable for back-of-the-envelope calculations, but when we are using such judgements to guide decisions, it is worth standardising. Similarly, it can matter whether the "best guess" is supposed to represent a median or a mean (the distributions we are considering often have fat tails which can cause these two to diverge substantially).

A second issue is easiest to explain with an example. Suppose we want to estimate the impact *X* of a campaign to persuade more people to volunteer for GWWC. We can break down *X* as the product *A_x_B*, where *A* is the number of volunteers it generates, and *B* is the average (mean) impact of each volunteer. This expression may be useful because we have a better idea of how to estimate *A* and *B* individually. Let's assume now that we have some best guesses E(*A*) and E(*B*) for *A* and *B*. We multiply these together to get a best guess E(X) = E(*A*) E(*B*) for *X*(2). It seems we should also be able to multiply our upper bounds *U(A)* and *U(B)* to get an upper bound *U(X)* for *X*. But if we choose a given meaning for upper bound-let's say the 90th percentile-we can see the problem with this approach.

There's only a 1 in 10 chance that the true value of *A* lies above *U(A)*, and a 1 in 10 chance that the true value of *B* lies above *U(B)*. Since it seems possible for one of these to happen without the other, they aren't perfectly correlated. If very high or low values for A and B don't tend to occur together, there's much less than a 1 in 10 chance that they're both extreme enough that the true value of *X=AB* lies above *U(A)xU(B)*. So *U(X)* should actually be lower than *U(A) U(B)*. Similarly our lower bound for *X* should be a bit higher than the product of the lower bounds for *A* and *B*.

To be more precise about how to find an upper bound on *X* from those on *A* and *B*, we can build a model of how our estimation error is distributed. We are developing such a model. It also aims to help with the first issue discussed above by standardising procedure and usage.

I'll give an overview of the model while ignoring some technicalities. Our model is for cases where we can break up the value *X* we're trying to estimate as a product *Y(1) Y(2)...Y(n)* of other values, like in the example above. We assume a log-normal distribution for *X*. The justification for this assumption is that if a lot of variables are multiplied together, by the central limit theorem we expect the product to converge to a log-normal distribution(3).

To work out the parameters for the log-normal for *X*, we just need to keep track of a couple of values associated to each of the *Y(i)*. We can normally infer values for these if we have a best guess and a confidence interval, which is what people are used to providing. We want this method to be accessible, so we'll provide a spreadsheet to take care of the calculations so that you can use it without having to follow all of the technical details.

Finally, what should we use these numbers for? Let's assume we're evaluating the cost-effectiveness of different interventions, in DALYs/$. We might use different aspects of the figures for different purposes. If we want to recommend an intervention we are sure is doing good, we might look for a high lower bound. This will tend to favour cases where are estimates have low uncertainty. In other cases we might care more about the expected DALYs/$, which is likely to be higher than our best guess, and in cases with great uncertainty significantly higher because of the fat tail of the log-normal distribution. We should be careful when doing this, because we are likely to see regression to the mean especially out in the tail; if we have lots of estimates with large uncertainty, some of them are likely to be substantially overestimated by chance, and we risk focusing our attention on these in appropriately. We're still considering the appropriate way to deal with this. One thing which is clear is that these estimates can be very helpful in identifying where further research is likely to be most useful. Things with high expectation and high uncertainty are the best candidates. You can also see which parts of the calculation are contributing the most uncertainty and warrant further analysis. In most cases, we'd expect this research to reduce our expectation of the intervention's cost-effectiveness, but when it increases it can do so by a large amount, and these cases can be of huge potential importance.

(1) Bounds of 0 and 1 on probabilities are a notable exception.

(2) Aside for the technically minded: this works if "best guess" means expectation; it also works for medians if our subjective probability distributions for the true values are log-normal[a].

(3) Though note convergence can take a long time in the tail, something we hope to explore more in future.