Beware of Economists Bearing Evidence

Image result for beware greeks bearing gifts

Josh Angrist, the MIT economist and a leading voice on research methods and education policy, has a recent piece in Forbes in which he praises the dawning of a new era in which policymakers are guided by economists conducting experimental analyses of promising education reforms. He writes:

Alas, school reform has rarely been grounded in the sort of empirical analysis required of a new drug or medical treatment. Many educational innovations are propelled primarily by a politician or philanthropist’s good feelings. It shouldn’t surprise us that weakly researched innovations often lead to disappointing results. But this unscientific approach is now changing. America’s large urban districts are piloting new models for education delivery, such as small schools, charter schools, various sorts of magnet programs, and vouchers. Importantly, these innovations are often deployed through experiments… Economists nowadays use these experiments to provide credible, non-partisan evidence on the consequences of school reform.

To be sure, experimental methods are the best way to identify causal effects, and most of my own research uses this approach. Unfortunately, this improvement in methods does not always yield credible and non-partisan evidence because it is all too common for researchers to misinterpret the policy implications of these experiments, even when they are properly conducted. Several examples of this type of misinterpretation can be found in Angrist’s brief Forbes article. I’ll pick one to illustrate the point.

One of Angrist’s claims is that a certain type of charter school has been demonstrated as an effective policy with this rigorous new approach to research: “I’ve seen compelling evidence that urban charter schools emphasizing high expectations and data-driven instruction are winners, capable of closing the black-white achievement gap in just a few years.” The evidence to which Angrist is primarily referring is the experimental evaluation of Boston charter schools in which he has been involved with several co-authors. That research has shown large test score gains among students admitted to those Boston charters by lottery relative to those not admitted.

The problem is that increasing test scores does not necessarily mean that a policy is a “winner.” Test scores are an imperfect proxy for a set of knowledge and skills that we hope translate into greater educational and life success for students. Unfortunately, a growing body of research is showing a disconnect between changing test scores and changing later life outcomes for students. But we don’t have to look across the entire research literature to find numerous examples of this disconnect between changing test scores and changing later life outcomes. We can find evidence of it in the very Boston charter schools on which Angrist relies for his claim.

A new study by one of Angrist’s former students, Elizabeth Setren, examined test scores for students admitted by lottery to Boston charters but also tracked those students all the way through college completion. The main purpose of her study was to disaggregate effects for special needs and English language learner (ELL) students, so she never actually reports the combined results for all students. But we can see from the results for general education students, who comprise the vast majority of students in the study, what the overall results must be.

Like Angrist’s previous research, Setren finds large test score gains for students admitted to Boston charter schools by lottery. As shown in Table 4, general education students admitted to Boston charters benefit by .268 standard deviations (sd) on math tests and .163 sd on English Language Arts tests. ELL and special ed charter students show similar test score benefits. But as shown in Table 5, Boston charter school students are no more likely to graduate from high school than the lotteried control group, even five or six years after starting high school.

In Table 6, we can see that despite this lack of improvement in high school graduation rates, Boston charters are more likely to have their general education students enroll in post-secondary education, driven largely by an increase in enrollment in 4-year institutions with a possible decline in enrollment in 2-year schools. Boston charters’ special needs students show no statistically significant increase in post-secondary enrollment. Toward the bottom of Table 6 we can see college completion rates. Neither special needs nor general education students are more likely to complete a post-secondary degree in 4 years than the control group of students denied admission to Boston charters by lottery. In fact, the estimated effect for general education students is negative, but not statistically significant.

So, the overall picture does not show a policy that is a “winner.” One of Angrist’s former students, using the type of experimental method he endorses to examine the policy he claims is proven to work actually shows that in the long run the policy may produce no benefits or may even produce a harm. General education students admitted by lottery to Boston charters do experience large test score benefits, but they are no more likely to graduate high school. Those students are also more likely to enroll in post-secondary education but no more likely to obtain a post-secondary credential than the control group. Students who take out loans to enroll in college but do not finish it may be worse off, so this pattern of results may suggest that Boston charters actually harm their students’ long-term educational outcomes.

And once again large gains in test scores are not a reliable proxy for improvement in later life outcomes. In the Forbes piece, Angrist suggests otherwise: “Though imperfect, test-based measures of value-added predict gains in important economic outcomes like college enrollment and earnings.” Notice the rhetorical sleight of hand in Angrist’s claim. The issue is not whether test scores are correlated with later life outcomes but whether rigorously identified changes in test scores produced by policy interventions translate into later changes in life outcomes. In the case of Boston charters, changes in test scores are not consistent with changes in later life outcomes, at least for general education students who constitute the bulk of the program.

Angrist is right that experiments are good and useful. But he is wrong about the dawning of a new age of science-driven education policymaking. Science is only as good as the proxies we use for outcomes we may really care about and only as reliable as the accuracy with which the scientists describe the research literature. So, when economists come to policymakers to say that science has spoken and we now know what works, policymakers have every reason to retain some skepticism.

Updated post note — The original version of this post noted that special needs students failed to improve on test scores but did show a higher likelihood of completing a 2 year college. That was a correct reading of the results as displayed in Table 4, but was inconsistent with the text of the paper. The author has acknowledged that the table was in error, so I have modified the post to reflect her corrected results.

This entry was posted on Thursday, July 25th, 2019 at 11:11 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to Beware of Economists Bearing Evidence

Michael Shaughnessy says:

July 25, 2019 at 11:22 am

Elizabeth Setren’s study is important for one specific key word–disaggregate. What ” works” for a student with a I.Q. of 125 may not work for a student with a learning disability or a student with medical or health needs or an English Language learner. The other key word ( that few like to hear or understand ) is heterogeneity. The more heterogeneity in the school, or class or even state, the more difficult it is to come to real conclusions and to make generalizations. Hawaii is different than Alaska…..this may sound simplistic- but the populations of these two states is quite different- and the implementation of any experiment is likely to be difficult if not different.

Reply
Mark Dynarski says:

July 25, 2019 at 1:44 pm

Two thoughts. One is that the title and the piece are examples of a fallacy of composition. That Josh Angrist is an economist does not mean educators should be wary of economists as a group.

The second thought is that, relationships between test scores and later outcomes aside, what the Forbes piece fails to note is that charters are exceptions in studying reforms. Many states require charters to use lotteries when they are oversubscribed, enabling researchers to exploit the lotteries and use experimental methods to measure effects of charter schools. Voucher programs such as in DC and Louisiana likewise use lotteries and have been studied with experimental designs.

Contrast this with school turnarounds, which are a crucially important policy issue but cannot be studied with experimental designs. To this day it’s hard to know how best to turn around a failing school.

When whatever is new to education is first being tested rigorously, we’ll know the scientific era has arrived. We are not even close now.

Reply
- Jay P. Greene says:
  
  July 25, 2019 at 6:04 pm
  
  Both are good points. I only used the title to have a bit of fun. And if Angrist can take collective credit only for economists for conducting policy experiments, excluding most of the experiments that have been done on vouchers and other policies, I figured I could attribute skepticism to that collective.
  
  Reply
  - Mark Dynarski says:
    
    July 26, 2019 at 8:47 am
    
    True. It would be hubris for economists to accept credit for introducing experiments to education, when psychologists such as Donald Campbell and Bob Boruch were there decades before.
    
    Skepticism about the value of test scores is warranted by findings showing low correlations of scores with later outcomes that are more impactful, like college graduation. But I don’t think the preK to 12 enterprise is structured for outcomes other than test scores, or, for higher grade levels, course credits. Researchers are not off base to study scores, but they may not have as complete a picture as would be desirable of the importance of scores or the lack thereof.

Jay P. Greene's Blog