CREDO has produced a slew of studies comparing test score outcomes for students in charter and traditional public schools. Those studies have come to dominate public policy and foundation discussions about charter schools and are sometimes thought to be the highest quality studies on charter effects. They are not.
We actually have more than a dozen random-assignment studies of charter school achievement effects. For a summary of what those gold-standard studies find, see this systematic review by Cheng, Hitt, Kisida, and Mills (or if you have difficulty with the pay-wall you can find an earlier working paper here).
CREDO’s research design is not gold standard. It’s not even silver. Maybe it’s formica. It would be understandable for you to be confused and think CREDO was gold standard given how much people in policy circles talk about that research as opposed to the set of gold-standard random-assignment experiments. And you might be further confused by the language CREDO uses when they describe their research design as comparing “virtual twins.”
CREDO’s methodology does not compare twins, virtual or otherwise. All they are doing is comparing students who are similar on a limited set of observable characteristics — race, age, gender, and prior achievement scores. “Matching” students on those observable characteristics is just as prone to selection bias as any other observational study that controls statistically for a handful of observed characteristics when comparing students who choose to be in different school sectors. That is, students who choose to attend charter schools are very likely to be different from those who choose to remain in traditional public schools in ways that are not captured by their race, age, gender, and prior test score. In particular, their desire to switch to a different kind of school may well be associated with developments in their life that might affect the future trajectory of their test scores. In short, school choice is prone to bias from selection in observational studies like CREDO.
CREDO overstates the strength of their methodology by referring to their approach as one that compares “virtual twins.” They say: “a ‘virtual twin’ was constructed for each closure student by drawing on the available records of students with identical traits and identical or very similar baseline test scores.” ( p. 3) It is probably unintentional, but this description gives the false impression that they are comparing “identical” students in different sectors. In reality they are only comparing students who are similar on a handful of observed characteristics. Ladner and I may both have beards, enjoy a malt beverage, and are interested in school choice but that does not make us “twins” nor would it be reasonable to describe us as having “identical traits.”
Unlike CREDO, gold-standard random assignment studies are not subject to selection bias because only chance distinguishes between whether students are in charter or traditional public schools. On average, the students being compared in randomized control trials (RCTs) are truly identical on all observed and unobserved characteristics. They really are virtual twins.
Backers of CREDO can point to the fact that the CREDO methodology has produced results that are similar to experimental studies in a few locations and claim that selection bias must therefore not be an important problem. This is a faulty conclusion. Finding that CREDO’s observational method and randomized control trials sometimes produce similar results only proves that selection did not bias the results in those cases. In other cases charters may attract students who are very different in their future achievement trajectory and RCTs would produce results that are very different from an observational study. Online charters are likely a clear example of where this selection bias would be severe.
The problem is compounded by the fact that policymakers and foundation officials are too eager to use CREDO results for a number of reasons that have nothing to do with the quality of the methodology. Sometimes they want to use CREDO because it supports their preferred policy conclusions. They also have a strong preference for studies that name the city or state they are considering.
It’s as if Jonas Salk proved that the polio vaccine works in an RCT, but policymakers and foundation officials want to know if it prevents polio in New Orleans or Detroit. Rather than rely on a lower quality research design that mentions their town, policymakers and foundation officials should focus on the highest quality charter randomized control trials, of which we have more than a dozen. If that evidence shows the polio vaccine to work, then they should assume it also works in their town.
“Maybe it’s formica” A+
So if I understand this issue correctly, school choices studies (if they want to be reliable) need to do the following:
-Utilize random assignment when comparing student samples.
-Check the progress of charter schools/choice programs over time (since they tend to get the hang of things in the first few years of operation).
Are there other major things to control for that the two procedural issues above may not take into account?
Randomized experiments allow us to draw causal inferences with the highest confidence. Given that we have more than a dozen such studies of charter schools, I’m not sure why we would pay much attention to other studies that lack the same ability to say that charter schools actually cause certain outcomes. I haven’t thought through what else you might want to see in charter studies, but I do know that we should prefer those that permit strong causal conclusions.
Do I get to be the “evil” virtual twin?
Sure. Just make your eyebrows darker and maybe change the beard to a goatee.
Pop culture reference A
External validity matters too, especially to policymakers who think it’s not as big a deal when a policy works somewhere else as it is when it works in their domain. CREDO has more external validity than a smattering of RCTs even if its method cannot eliminate selection bias.
One can argue the tradeoff between external and internal validity, and the development literature has had a long and spirited debate about it. Deaton and Cartwright’s recent paper covers that debate. http://www.nber.org/papers/w22595.pdf. They also argue that the internal validity of RCTs is not worth much if the mechanisms yielding the effects are not identified.
I am a big fan of experiments, and I am not suggesting nonexperiments should be given priority over experiments. But let’s recognize that we will never have the first-best solution of all charter schools or even a national sample of charter schools being assessed with experiments. Criticizing CREDO because it is not doing the impossible seems unfair.
The CREDO results can be overweighted by policymakers, especially when the results point in the ‘right’ direction. But, two points. First, the likely alternative to policymakers overweighting (or underweighting) results is that they use information of even less scientific merit to arrive at policies. Hmm. And, second, we have all seen studies being embraced or rejected by policymakers for reasons that have little to do with scientific merit, but that does not mean we should fault the scientists doing the studies unless they endeavor to mislead.
Giving priority to external validity over internal validity is like the old joke about the food being bad but at least there are large portions. Why would we be happy about potentially having the wrong causal claim because we could apply that wrong conclusion to more people? This is worthy of a longer discussion which I’ll do in a future post.