Estimating the Effectiveness of DCP2

When searching for high-effectiveness charities, there are a few resources that are incredibly useful. One of those is the output of the Disease Control Priorities Project (DCP), which provides an analysis of the cost-effectiveness of a great number of health interventions. Given the value of this data for us as donors, we might wonder just how valuable this information is, and whether it might be worth donating to DCP to produce more of it.

In this post I will present a statistical model (designed by myself and Nic Dunkley) for estimating the effectiveness of donating to DCP. Although this will inevitably over-simplify the situation, it will be a first step that we can build from. Although the model may be useful for looking at other similar organizations, the estimates that I will derive rely on data about DCP itself.

Here's the idea. Firstly, we assume that DCP has some number of followers who make their donation decisions based on DCP's recommendations; that is, they put their donations towards whatever intervention DCP recommends most highly. If DCP says the best thing is supporting homeless shelters in the UK, then that's where the followers will send their money. So the current situation would have all that money going towards the intervention that the DCP2 report claims is most effective. We'll refer to the money donated as DCP's "money moved". Furthermore, we'll assume that this money is all translated into DALYs at the rate that DCP claims.

Now, suppose that there's a big bag of interventions that DCP assesses. Then we assume that for some amount of money, DCP can pick a random intervention from the bag and accurately assess its effectiveness. If it comes out better than their current best recommendation, then when they report on this fact, their followers will shift their donations accordingly.

This may not be the best way to model the DCP’s impact. Governments which make use of the DCP may not use it to identify the one ‘best’ treatment in order to fund that, but rather just to work out which treatments pass a minimum cost effectiveness threshold and which don’t. The higher above this threshold a project is, the more likely it is to get done. This would require a more detailed model, which we may produce at a later time.

However, If the original description were a good model of how DCP works, then we now have enough information to work out how much good we can do by donating to DCP! I'm going to be using R, a statistics package, to do the heavy lifting - it's free and I've attached the script I used if you're keen and want to check my working.

Firstly, we need to know something about the distribution of the interventions in the bag. A quick reminder about probability distributions: the probability distribution of some random variable usually looks like a humped graph. Along the bottom are all the options we could have. In this case, that's the possible cost-effectiveness of a random health intervention, i.e. pretty much any real number. Along the side it shows how likely that possibility is. One very common type of distribution is a “normal” distribution. This just looks like a big hump in the middle that spreads out to the sides (often called a “bell-curve”). Intuitively, that means that most of the time you get one of the possibilities in the middle, and the other possibilities get less likely as they get further away from the middle. Normal distributions are quite nice to work with, and they're also pretty common. Importantly, they can be totally specified by just two parameters: the mean of the distribution (the average outcome you get), and the standard deviation (which tells you how far away from the mean things tend to be). Between the two of these you know where the middle of the hump is (the mean), and how wide it is (the standard deviation), and that's all you need to know.

Back to our health interventions. Fortunately, we have a lot of data from DCP about the cost-effectiveness of interventions: if we plot the effectiveness of the interventions they assessed in the DCP2 report, we get a distribution that looks a bit like this:
 
 
 
That's definitely not a normal distribution, but it is a “log-normal” distribution. A log-normal distribution is fairly straightforward: if you take the logarithms of all the possibilities along the bottom, and then plot the distribution, you get a normal distribution. And if we do that for the DCP data, then we do indeed get a very plausible-looking bell-curve.
 
 
Now, this makes it looks pretty plausible that the DCP data really does follow a (log-)normal distribution, and it'll be helpful if we can work out the two parameters for it that I mentioned above. For the moment, let's assume that taking the mean and standard deviation of the logarithms of the data gives us the right parameters – we can be more sophisticated (in particular we're going to want to think about the possibility of error), but I'll leave that for a later post.
 

Having the parameters for the distribution means that we can get R to make up entirely new data points as if they came from the original data set. And if the actual data really is distributed in the way we think it is, then this is a lot like DCP finding and assessing a new random intervention. We can then do the rest of our model process – check whether it's a better charity than we had before, and if so work out what the difference is – and get an answer for how much difference that one attempt would make. We can then use R to simulate doing this a large number of times, and then simply average the results! Thus: we generate a large number of samples, to represent a large number of investigations by DCP. We can then work out the difference in effectiveness between each “new” intervention and the current best option. The current best option is better than average, so we'll expect the difference in general to be negative. We can regard all those results as 0, however, as in that case there would (in our model) be no change in where the money goes. Finally, we can then average these results. Call that average A – that is, the average improvement in the effectiveness of the most recommended charity due to DCP doing one investigation.

At this point, we need to plug in two final, very important parameters. These are figures for the money moved (M), and for the cost of DCP doing an investigation (C). These are particularly important as the our final estimate is going to be the average good done by an investigation (A*M), divided by the cost of doing it (C). Our final estimate will therefore be directly proportional to M, and inversely proportional to C. We'll discuss these a little more in a future post. For now, I'm just going to give you some figures that are hopefully about right: namely $530 million for M, and about $100,000 for C.

Putting all this together, we get an estimate for the effectiveness of DCP of around 89 DALYs/$. For comparison, the best intervention that DCP claims to have found so far clocks in at 0.33 DALYs/$! As mentioned earlier, of course, this estimate is only as good as our model, but it certainly looks promising. In the next post I'll talk about how we can improve the model in a couple of ways, most importantly including the possibility of error. This will lead to a significant reduction in our estimate of DCP's effectiveness, but it will still appear to be extremely competitive.

A few final caveats:

- This kind of research will be a high-variance strategy: most of the time nothing useful will happen, but occasionally there will be really, really good outcomes.

- The hidden parameter in this model is the effectiveness of DCP's current strongest recommendation. In particular, if new interventions are discovered that have much higher effectiveness, then the need to do further research drops.

- The model assumes that the distribution of interventions DCP would evaluate with more funding looks the same as the distribution of interventions they have already identified. In practice they may have looked at the most promising options first.

- These results are not immediately applicable to charity-evaluators such as GiveWell. GiveWell looks at charities, not interventions, and we don't have a data-set for that in the way that we have for DCP. It may be that if we make some assumptions we might be able to adapt the data, but this is an open question.

Read more on our key sources here

Comments

Cool post, Michael. This looks promising, and I'm curious about future posts. It raises the question, though, does DCP take donations? This might be an interesting result for meta-intervention research in any case.

Impressive model! There are two further ways in which DCP helps influencing donor decisions which the model misses out on, however: in assessing very inefficient methods interventions, donors funding these interventions learn how their intervention compares with more efficient ones. Seeing their intervention ranked at the bottom of the efficiency scale might provide a far greater incentive to move to more efficient interventions than the discovery of a even more efficient intervention.
Moreover, many donors are strongly committed to fight a particular disease or support a certain group. A donor committed to supporting women specifically will not change the structure of its funding because a new efficient vaccine against tuberculosis has been discovered. What such donors are interested in are interventions targeting the same group that are more effective. If these local comparisons play indeed an important role, many of the interventions that turn out to be less efficient than the top intervention might still be the most efficient intervention a specific donor is ready to pursue.

Extremely cool model - nice work! Two things spring to mind (pre-emptive apologies for own idiocy): 1) Given log-normal distributions have no ceiling, if DCP evaluates enough interventions it is likely to eventually come across one hundreds of times better than the current top intervention. Given how much of your model is working at the right tail, this might lead to overestimating how good more DCP would do if the ceiling is not that much higher than the current 'top recommendation'. A priori, there should be a ceiling somewhere (if nothing else, the DCP can only survey extant interventions, and so there's going to be a first ranked member of this finite set), but I have little idea how to model it. I'd be surprised (if we limit ourselves to global health) there is an intervention out there 1000x better than the DCP best bet, but I'm not mathsy enough to intuit whether that would slash the benefit predicted by the model. Maybe one could stick some sort of envelope on top of the right tail with guestimates of how likely we think there could exist an intervention 10x, 100x etc. more effective, and see if that changes things? 2) From your comments (and the fact your model uses the current best intervention), you are trying to estimate how much good funding more DCP would do. I think it would also be interesting to look at how good DCP has done in the first place, not only as another 'smallpox eradication' example of high impact, but also as a case study of the value of information.

Thanks for your comments, guys!

@Marco

It's true that this model doesn't really model the donors in any significant sense. However, while the effects that you mention are real, the donors that really matter for this estimate are the ones that really will stick with DCP's recommendations. Most of the expected value comes out of the odd, rare intervention that's way over to the right. The donors that matter are the ones who would be willing to give to even something quirky if it happened to be plausibly extremely cost-effective.

This does mean that it's harder to estimate the parameter M: it might be that many of the entities that appear to take DCP's recommendations into account are actually more interested in specific areas, in which case even if DCP does find a truly exceptional intervention, it would not get funded as much as the model assumes. Probably worth thinking about some more!

@Greg

a) Finitude isn't really a problem. Even if we think that there is only one intervention left in the world to survey, we still believe that the overall sample was taken from a log-normal distribution, and so *as far as we know* the last sample could be anything (with commensurate probability). That is, the distribution precisely *captures* how likely we think it is that there will be interventions 10x, 100x more effective. Of course, if we could come up with an independent upper bound, or had some reason to think that the distribution would deviate from a log-normal at the top, then that would change things. (Indeed, I think Toby believes that it's actually *better* than log-normal, with a fatter tail, given the existence of things like smallpox eradication, which would have been just unbelievably far off to the right).

b) That's a very interesting question, actually! I *think* SCI was founded at least partially due to DCP2, but I'm not totally sure. This whole model is an exercise in value of information calculations: we're essentially asking how much we would be willing to pay to get a chance of some more information.

Hello Michael. Thanks for getting back to me and clarifying what I should have meant by the ceiling issue. Despite examples like smallpox, I'm a bit surprised we can hope there will be interventions miles above the current 'best bet'. However, Toby know much more about this than me, so I'll follow him. Look forward to seeing the other posts.

@Greg

Let me give you an example. Suppose we have a box that produces samples from a log-normal distribution - we know this. We take a bunch of samples and hide them away somewhere where we can only get at them with some work. Even if we've looked at all but one of the samples, we should still expect the last sample to look like a random sample from the log-normal distribution! Sure, it's actually a fixed value, but as far as we know, it's random.

In the same way, the evidence suggests that charity effectiveness looks like a sample from a log-normal. That's all we know. So, even if we've looked at almost every intervention already, we should still expect the last one to look like a random sample from the distribution. Unless, of course, we have some other information about the distribution!

Smallpox eradication should make us suspect that the distribution is better than log-normal, because even with a log-normal, our chances of having seen an intervention that far above the mean are incredibly low. So that increases the plausibility of a log-normal-with-fat-tail distribution.

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <div> <img>
  • Lines and paragraphs break automatically.
  • Submitted HTML will be sanitized and cleaned for compliance automatically before output.

++Filtered HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <h2> <h3> <h4> <h5> <h6> <img>
  • Submitted HTML will be sanitized and cleaned for compliance automatically before output.
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

Unfiltered HTML

Imported HTML