
Some researchers and journalists have become very excited about a new set of studies that claim to find a causal relationship between increasing school spending and improving student outcomes. These folks acknowledge that the vast majority of earlier research found no relationship between additional resources and stronger results, but that research was purely observational. Perhaps school systems with weaker outcomes tend to get a larger share of increased spending, creating the false impression that more money doesn’t help. That is, perhaps bad outcomes often cause more money, not the other way around.
There is a new wave of research that claims to find the causal relationship between school spending and student outcomes and those new results are much more positive. The problem is that the new research pretty clearly falls short of having strong causal research designs. Instead, the new research just seems to be substituting different non-causal methods with a different potential direction of bias for the old ones.
The new “causal” studies generally come in two types — regression discontinuity (RD) studies of bond referenda and instrumental variable (IV) analyses of court-ordered spending increases. While RD and IV designs can produce results that approximate a randomized experiment and can be thought of as causal, the RD and IV studies in this new literature generally fail to meet the requirements for those designs to effectively approximate randomized experiments. That is, the new “causal” research on school spending is not really causal.
To illustrate the problem with the use of RD to study bond referenda, let’s look at the study that was just published in the Journal of Public Economics (JPE), a very high-status journal. A working paper version of this study that is not behind a pay wall can also be found here. The idea of this RD, like others in the new school spending literature, is that bond referenda that barely pass and those that barely fail can be treated as approximating a randomized experiment. That is, there is a large element of luck in whether a bond barely passes or not, so by chance some schools get extra money and others do not. If those that get that extra money by luck produce better student outcomes over time than those that don’t get the extra money by chance, then we can say that money — and not other factors — caused the change in outcomes.
The JPE study, like most of the other RD studies in this new literature, falls short of approximating a randomized experiment in two ways. First, we can only view RD results as causal if the set of observations examined is sufficiently narrow that we can plausibly think that it is effectively chance whether the treatment is received or not. But the JPE study defines bond referenda as “near the threshold” for passing if they are withing 20 percentage points of the percent required for passage of the referendum. That is, if 50% is needed to pass a referendum, the JPE study would define the election as “near the threshold” if the bond received between 30% and 70% of the vote. This bandwidth is so wide that it includes almost two-thirds of all bond referenda in the states they examine. To call this “near the threshold” is misleading. And it is simply implausible to think of any outcome between receiving 30% and 70% of the vote as a matter of luck.
Second, we can only view RD results as causal if actors have no control over whether they fall on one side or another of the threshold. In the case of bond referenda that requirement is clearly violated. Districts choose whether and when to hold a referendum and they do so based on their estimated likelihood of prevailing. In addition, districts try to have a finger on the pulse of the campaign and can alter the effort by them and their allies to improve the chances of victory. In sum, whether districts win or lose referenda is partially a function of their political competence and resources, which are qualities that the researchers cannot observe or control and yet are likely to be associated with changes in student outcomes over time.
The IV studies in this new literature are no better at approximating randomized experiments. For IV research designs to produce causal results, they need to have an exogenous instrument — something that predicts whether schools get more money or not, but which is uncorrelated theoretically and empirically with later student outcomes. While the details vary across study, the general approach of the IV studies in this literature is to treat court-ordered spending increases as exogneous. That is, they have to believe that legislatively adopted spending increases, which past studies primarily relied upon, risk reverse causation, but court-ordered spending increases are fundamentally different. Court-ordered spending has to be thought of as manna from heaven, dropping on schools as if at random. At the very least we have to believe that court-ordered spending differs from the regularly legislative kind in that it has nothing to do with factors that contribute to improved student outcomes.
It is clear that court-ordered spending increases are not exogneous and are not fundamentally different from the regular legislative kind. Courts are political actors, just like legislatures, and whether and when the courts order spending increases is at least partially a function of a broader political conviction in a state that more resources are available and should be devoted to schools. That conviction is just as likely to be associated with future improvement in student outcomes if it is expressed by the courts as if it is expressed by the legislature.
Both RD and IV studies in this new literature attempt to justify that their efforts are causal with empirical claims about the similarity of treatment and control groups before spending is increased. But they can only compare on observable qualities, which is precisely the same thing that prior observational studies do. These studies need to be able to justify theoretically that their approaches approximate random assignment, but they cannot do so persuasively. Whether bond referenda pass or fail, especially by large margins, is not random. And whether and when courts order spending increases is also not random — at least no more or less so than when legislatures do it.
If these new RD and IV studies cannot persuasively argue that their approach approximates randomization, then their results are not more causal than the prior observational literature that showed no relationship between spending increases and improved student outcomes. The promoters of this new school spending research are right to note the flaws of earlier studies, but they are insufficiently aware of flaws in the new research as well.
Given the causal weakness of both literatures, we should probably take a step back and see if either better conforms with our non-rigorous observation of the world. As Rick Hanushek has noted, if the new research is right in its causal claims about more money improving outcomes, why have huge spending increases over decades not been associated with the kinds of improvements the “causal” research claims to find?