Josh Angrist, the MIT economist and a leading voice on research methods and education policy, has a recent piece in Forbes in which he praises the dawning of a new era in which policymakers are guided by economists conducting experimental analyses of promising education reforms. He writes:
Alas, school reform has rarely been grounded in the sort of empirical analysis required of a new drug or medical treatment. Many educational innovations are propelled primarily by a politician or philanthropist’s good feelings. It shouldn’t surprise us that weakly researched innovations often lead to disappointing results. But this unscientific approach is now changing. America’s large urban districts are piloting new models for education delivery, such as small schools, charter schools, various sorts of magnet programs, and vouchers. Importantly, these innovations are often deployed through experiments… Economists nowadays use these experiments to provide credible, non-partisan evidence on the consequences of school reform.
To be sure, experimental methods are the best way to identify causal effects, and most of my own research uses this approach. Unfortunately, this improvement in methods does not always yield credible and non-partisan evidence because it is all too common for researchers to misinterpret the policy implications of these experiments, even when they are properly conducted. Several examples of this type of misinterpretation can be found in Angrist’s brief Forbes article. I’ll pick one to illustrate the point.
One of Angrist’s claims is that a certain type of charter school has been demonstrated as an effective policy with this rigorous new approach to research: “I’ve seen compelling evidence that urban charter schools emphasizing high expectations and data-driven instruction are winners, capable of closing the black-white achievement gap in just a few years.” The evidence to which Angrist is primarily referring is the experimental evaluation of Boston charter schools in which he has been involved with several co-authors. That research has shown large test score gains among students admitted to those Boston charters by lottery relative to those not admitted.
The problem is that increasing test scores does not necessarily mean that a policy is a “winner.” Test scores are an imperfect proxy for a set of knowledge and skills that we hope translate into greater educational and life success for students. Unfortunately, a growing body of research is showing a disconnect between changing test scores and changing later life outcomes for students. But we don’t have to look across the entire research literature to find numerous examples of this disconnect between changing test scores and changing later life outcomes. We can find evidence of it in the very Boston charter schools on which Angrist relies for his claim.
A new study by one of Angrist’s former students, Elizabeth Setren, examined test scores for students admitted by lottery to Boston charters but also tracked those students all the way through college completion. The main purpose of her study was to disaggregate effects for special needs and English language learner (ELL) students, so she never actually reports the combined results for all students. But we can see from the results for general education students, who comprise the vast majority of students in the study, what the overall results must be.
Like Angrist’s previous research, Setren finds large test score gains for students admitted to Boston charter schools by lottery. As shown in Table 4, general education students admitted to Boston charters benefit by .268 standard deviations (sd) on math tests and .163 sd on English Language Arts tests. ELL and special ed charter students show similar test score benefits. But as shown in Table 5, Boston charter school students are no more likely to graduate from high school than the lotteried control group, even five or six years after starting high school.
In Table 6, we can see that despite this lack of improvement in high school graduation rates, Boston charters are more likely to have their general education students enroll in post-secondary education, driven largely by an increase in enrollment in 4-year institutions with a possible decline in enrollment in 2-year schools. Boston charters’ special needs students show no statistically significant increase in post-secondary enrollment. Toward the bottom of Table 6 we can see college completion rates. Neither special needs nor general education students are more likely to complete a post-secondary degree in 4 years than the control group of students denied admission to Boston charters by lottery. In fact, the estimated effect for general education students is negative, but not statistically significant.
So, the overall picture does not show a policy that is a “winner.” One of Angrist’s former students, using the type of experimental method he endorses to examine the policy he claims is proven to work actually shows that in the long run the policy may produce no benefits or may even produce a harm. General education students admitted by lottery to Boston charters do experience large test score benefits, but they are no more likely to graduate high school. Those students are also more likely to enroll in post-secondary education but no more likely to obtain a post-secondary credential than the control group. Students who take out loans to enroll in college but do not finish it may be worse off, so this pattern of results may suggest that Boston charters actually harm their students’ long-term educational outcomes.
And once again large gains in test scores are not a reliable proxy for improvement in later life outcomes. In the Forbes piece, Angrist suggests otherwise: “Though imperfect, test-based measures of value-added predict gains in important economic outcomes like college enrollment and earnings.” Notice the rhetorical sleight of hand in Angrist’s claim. The issue is not whether test scores are correlated with later life outcomes but whether rigorously identified changes in test scores produced by policy interventions translate into later changes in life outcomes. In the case of Boston charters, changes in test scores are not consistent with changes in later life outcomes, at least for general education students who constitute the bulk of the program.
Angrist is right that experiments are good and useful. But he is wrong about the dawning of a new age of science-driven education policymaking. Science is only as good as the proxies we use for outcomes we may really care about and only as reliable as the accuracy with which the scientists describe the research literature. So, when economists come to policymakers to say that science has spoken and we now know what works, policymakers have every reason to retain some skepticism.
Updated post note — The original version of this post noted that special needs students failed to improve on test scores but did show a higher likelihood of completing a 2 year college. That was a correct reading of the results as displayed in Table 4, but was inconsistent with the text of the paper. The author has acknowledged that the table was in error, so I have modified the post to reflect her corrected results.