Anna Egalite and Matthew Ackerman have a new study out that examines whether the matching methodology used by CREDO to evaluate charter schools is “a reasonable alternative when the gold standard is not feasible or possible.” They conclude that it is. Using data from FL, they consider and rebut a series of common criticisms that have been made against the CREDO methodology.
They find that using multiple students when matching does not change results much from using a single match. They also find that matching on administrative classifications, like special education and English language learner, also does not distort results much even though those classifications are systematically different across sectors. And they find that more rigorous methodologies, like using exogenous instruments, yield similar results in FL to using CREDO’s matching method.
Anna and Matthew have done excellent work and convincingly demonstrated their case. Since Anna is a former student, who is now an Assistant Professor at North Carolina State (via a post-doc at Harvard), and another former student of mine, James (Lynn) Woodworth, is a researcher at CREDO and author of reports using this methodology, this superb analysis of CREDO’s approach fills me with pride in their accomplishments.
But I’m concerned that they or others may over-interpret what this study finds. It does not demonstrate that matching generally gives you the same result as randomized experiments or other gold standard methodologies. All that it demonstrates is that matching yielded similar results in this particular context. In this circumstance, the selection of students into charter schools did not produce important differences between treatment and control students on unobserved characteristics. And in this case, systematic differences in how charter and traditional public schools classify students into special ed, ELL, and free lunch did not bias the result. But the next time we use a matching methodology, the situation could be completely different. In the next matching study, the types of students who attend charters may be significantly different in unobserved ways and administrative classifications could produce strong bias.
People have a very bad habit of declaring that matching or another observational method is just as good as gold-standard research designs whenever the two produce similar results. They did this after Abdulkadiroğlu, et al produced their Boston charter results. But declaring that both methods are just as good ignores why we have gold-standard research in the first place. The bias of observational methods is typically unobserved. And those biases certainly exist some of the time even if they are not present all of the time. Finding similar results for matching methods in one circumstance does not erase this fact.
To their credit, Ackerman and Egalite are careful to emphasize that matching should only be considered when more rigorous approaches are not available. My strong preference is that we should avoid sub-par methodologies, especially when the same policy has been subject to at least some gold-standard evaluations. We don’t need a study on every charter school in every state. We should rely on the rigorous research where we have it and then extrapolate those results to other schools and states. I’d rather be guided by theory supported by rigorous evidence than demand sub-par evidence for all things. Demanding evidence for every school in every state gives us a false sense of confidence that we really know how each state and school are doing.
Unfortunately, in their drive to make “evidence-based” decisions and feel “scientific,” ed reform policymakers and leaders have demanded that evidence be produced for each school in each state. Some have gone so far as to demand evidence on the effectiveness of each teacher. We can’t produce rigorous evidence all of the time, so these demands for evidence are driving us toward lower quality research designs. That may produce unbiased results some of the time, but it certainly won’t all of the time. So, in the desire to be evidence-based and scientific we are likely to undermine the quality of evidence and science. Let’s stick to gold-standard work for policy questions where we have those studies.