by Quentin Wodon
In the first post of this series, I argued that impact evaluations could be highly valuable for organizations such as Rotary in order to assess the impact of innovative interventions that have the potential to be replicated and scaled up by others if successful. In the second post I suggested that a range of techniques are available to implement impact evaluations. In this third and last post in the series, I would like to mention some of the limits of impact evaluations. Specifically, I will discuss four limits: (1) limits as to what can be randomized or quasi-randomized; (2) limits in terms of external validity; (3) limits in terms of explanation as opposed to attribution; and finally (4) limits in terms of short-term versus long-term effects.
Can Everything Be Randomized?
The gold standard for impact evaluations is randomized controlled trials (RCTs), as discussed in the second post in this series. When it is not feasible to randomize the beneficiaries of an interventions, statistical and econometric techniques can sometimes be used to assess impact through “quasi-randomization”. But not all types of interventions can be randomized or quasi-randomized. If one wants to assess the impact on households of a major policy change in a country, this may be hard to randomize.
One example would be the privatization of a large public company with a monopoly in the delivery of a specific good. The company can be privatized, but typically it is difficult to privatize only part of it, so assessing the impact of privatization on households may be hard to do because of the absence of a good counterfactual. Another example would be a major change in the way public school teachers are evaluated or compensated nationally. At times, even with such reforms, it may be feasible to sequence the new policy, for example by covering first some geographic areas and not others, which can provide data and ways to assess impacts. But in many cases the choice is “all or nothing”. Under such circumstances techniques used for impact evaluations may not work. Some have argued that for many of the most important policies that affect development outcomes, the ability to randomize is the exception rather than the rule.
For the types of projects that most Rotary clubs are implementing, I would have doubts about an argument that randomization would not be feasible, at least at some level. This does not mean that all or even most of our projects should be evaluated. But we should recognize that most of our projects are small and local, which makes it easier to randomize (some of) them, when appropriate for evaluation. For larger programs or policy changes, one must however be aware that randomization or quasi-randomization are not always feasible.
Internal Versus External Validity
When RCTs or quasi-randomization are used to assess the impact of interventions, the evaluators often pay special attention to the internal validity of the evaluation. For example, are the control and treatment groups truly comparable, so that inferences about impact are legitimate? Careful evaluation design and research help in achieving internal validity.
But while good evaluations can be trusted in terms of their internal validity, do the results also have external validity? Do they apply beyond the design of the specific evaluation that has been carried out? Consider the case of a NGO doing great work in an area of health through an innovative pilot program. If the innovative model of that NGO is found to be successful and scaled up by a Ministry of Health, will the same results be observed nationally? Or is there a risk that with the scale-up, some of the benefits observed in the pilot will vanish, perhaps because the staff of the Ministry of Health are not as well trained or dedicated as the staff of the NGO? There have been cases of interventions when, as pilots were scaled up, their original promise did not materialize at scale.
Attribution Versus Explanation
Consider again the example of the dictionary project mentioned in the previous post. An impact evaluation could lead to the conclusion that the project improves some learning outcomes for children, or that it does not. Impact evaluations are great to attribute impacts and establish cause and effect. But they do not necessarily tell us why an impact is observed or not. For that, an understanding of the context of the intervention is needed. Such context is often provided by so-called process as opposed to impact evaluations. There is always a risk that an impact evaluation will be like a black box – impacts can be attributed, but the reasons for success or lack thereof may not be clear. This in turn can be problematic when scaling up programs that were successful as pilots because when doing so, it is often necessary to alter some of the parameters of the interventions that were evaluated, and without rich context, the potential consequences of altering some of the parameters of the original intervention may not be known.
Short Versus Long-term Effects
Another issue with impact evaluations is the time dimension to which they refer. Some interventions may have short-term positive impacts but no long-term gains. An evaluation carried out one or two years after an intervention may suggest positive impacts, but those could very well vanish after a few years. Conversely, other evaluations may have no clear impact in the short term, but positive impacts later on. Ideally, one would like to have information on both short-term and long-term impacts, but this may not be feasible. Most evaluations by design tend to look at short-term impacts rather than long-term impacts.
Implications of this Discussion
The above remarks should make it clear that impact evaluations are no panacea. They can be very useful – and I believe that Rotary should invest more in them for innovative projects that could be scaled up by others if successful – but they are not appropriate for all projects, and they should be designed with care.
I hope that this three-part series has helped some of you to understand better why impact evaluations have become so popular in development and service work, but also why they require hard work to set up well. Again, if you are considering impact evaluations in your service work, please let me know, and feel free to comment and share your own experience on this topic.