A Conversation with Holden Karnofsky, co-founder of GiveWell
Part I - GiveWell's Skepticism
Recently, Giving What We Can staff sat down with Holden Karnofsky, the co-founder of GiveWell. The goal was to try and critically explore GiveWell's methodology, research approach, and recommendations. The full interview can be downloaded as an MP3.
Giving What We Can's aim was to act as lawyers, not judges, to help GiveWell smooth over any potential flaws and become more accurate. In playing the 'devil's advocate', Giving What We Can might come across as more negative and skeptical than is their actual view of GiveWell.
Ultimately, these conversations are valuable because GiveWell relies on people to challenge them but, often enough, there are not enough feedback loops in the world of philanthropy. Giving What We Can's recommendations will also be improved, since we rely heavily on GiveWell's research. Likewise, it should benefit the thousands of people who donate based on GiveWell's recommendations and the hundreds of people who have taken Giving What We Can's pledge.
Rob Wiblin: It seems that GiveWell has a pessimistic view of charities, assuming the worst of them unless there is strong evidence otherwise. Is that an accurate representation of GiveWell?
Holden Karnofsky: No, and this is where a lot of confusion comes from. We should distinguish between a "pessimistic attitude" and a "regressive attitude". A pessimist attitude would say that charities are not doing good unless otherwise proven. A regressive attitude is that when you see a claim that someone is an outlier having massively more positive impact (or negative impact) than the average charity, they are probably closer to the average than they claim to be. This "regressive" attitude, not the "pessimistic" one, is the attitude GiveWell has.
We at GiveWell are looking for the best opportunities. Most charities are doing good work. But when we run across something that looks really good, the most likely explanation is a combination of exaggeration, error in estimation, or missing key pieces of information. However, as you gain more evidence and better knowledge in the field, this becomes much less of an issue.
In most areas of life you assume that things are "too good to be true" unless otherwise proven. You don't just jump at any "get rich quick" scheme -- when you see something you don't understand, you don't take immediate action. Rather, you are unsure and investigate it further. There might be reasons why this analogy isn't perfect for charity, but the general principle applies to many parts of life.
And throughout our history at GiveWell, this approach has been confirmed as something that works - it pushes our biases in the right direction relative to our initial estimate. It's not about being skeptical that charities are doing good.
Rob: Maybe a key disagreement between GiveWell and Giving What We Can is that GiveWell thinks we should shift our estimates downward in light of low evidence more than Giving What We Can thinks we should. What empirical evidence do you have to back up this claim? And how do you quantify how much we should regress?
Holden: I published a piece called Why We Can't Take Expected Value Estimates Literally (Even When They're Unbiased) and I published a spreadsheet there that gives a sense of how GiveWell thinks regression should be done. There also was a follow up called Maximizing Cost-Effectiveness via Critical Inquiry.
The basic idea is that when you have a "back of the envelope calculation" (where the 95% confidence interval includes the estimate that the charity is no more cost-effective than average), it's hard for your estimate to get to or above the two standard deviations from average range, no matter what your explicit estimate was -- even if you initially estimated that your charity had an impact that was thousands of standards of deviations better than average.
Owen Cotton-Barratt: GiveWell seems to imply that if average is within the 95% confidence interval, the estimate will always turn out to be only two standard deviations above average at best. Is this always the case?
Holden: This is not always the case. It's only for estimates that are sloppy and involve a lot of guesswork. I wouldn't want to make this claim about things we understand much better, and that's why GiveWell places such a high priority on understanding things better.
Consider the concept of "broad market efficiency". Imagine you're looking at a field of research you don't understand very well and lots of people understand it better than you. If you find something really great, it's more likely the experts are ignoring it for a good reason.
But if you're the expert, can see things from all angles, and you know all the relevant evidence, and you've checked everything out, it's a completely different story. And some really good work has been done, so GiveWell isn't saying that if you think something's a good idea you're wrong. And this is all consistent with what the math says.
GiveWell has never once gone looking to "knock down" an estimate. Our ideal situation is that we find massively excellent cost-effective interventions that are far better than anything else you could get. But we have found in our history that we've revised our estimates downwards. For example, very early in our history, we heard that the Fred Hollows Foundation was curing blindness for $50 and we hoped it was true. But when we looked in depth into the numbers, we found that the number was not well grounded.
When VillageReach was our top charity, we estimated them at under $1000 per life saved. But with further investigation to learn more, we kept finding that estimate going up, and ultimately we had to say that we weren't even sure if VillageReach was having an impact. And that was really tough for GiveWell because we were going against our first "big bet".
Thirdly, there is deworming. We were suspicious about the figure that deworming saves a year of healthy life (DALY) for every $3.41 spent. This number never looked right based on what we knew. When we looked into the spreadsheets for the number, we found five spreadsheet errors that involved overestimates and underestimates in cost-effectiveness. When those errors were all corrected, the "$3.41" estimate was off by a factor of 100. And we published all this work for the public.
Fourthly, a major foundation told us that Mothers2Mothers was having a huge impact, serving 20% of all eligible mothers in Sub-Saharan Africa. But when we looked into this, we found the charts didn't make any sense. Furthermore, if their claim were true, we should be able to see trends in country-level figures, and we did not. The country-level data flat out contradicted this claim, and we wrote about that on our blog.
I could give more examples, but this is a constant pattern: we find something, we get very excited, we look into it, and it doesn't come out looking as good as we thought.