The New “Causal” Research on School Spending is Not Causal

MagrittePipe.jpg

Some researchers and journalists have become very excited about a new set of studies that claim to find a causal relationship between increasing school spending and improving student outcomes.  These folks acknowledge that the vast majority of earlier research found no relationship between additional resources and stronger results, but that research was purely observational.  Perhaps school systems with weaker outcomes tend to get a larger share of increased spending, creating the false impression that more money doesn’t help.  That is, perhaps bad outcomes often cause more money, not the other way around.

There is a new wave of research that claims to find the causal relationship between school spending and student outcomes and those new results are much more positive.  The problem is that the new research pretty clearly falls short of having strong causal research designs.  Instead, the new research just seems to be substituting different non-causal methods with a different potential direction of bias for the old ones.

The new “causal” studies generally come in two types — regression discontinuity (RD) studies of bond referenda and instrumental variable (IV) analyses of court-ordered spending increases.  While RD and IV designs can produce results that approximate a randomized experiment and can be thought of as causal, the RD and IV studies in this new literature generally fail to meet the requirements for those designs to effectively approximate randomized experiments.  That is, the new “causal” research on school spending is not really causal.

To illustrate the problem with the use of RD to study bond referenda, let’s look at the study that was just published in the Journal of Public Economics (JPE), a very high-status journal. A working paper version of this study that is not behind a pay wall can also be found here. The idea of this RD, like others in the new school spending literature, is that bond referenda that barely pass and those that barely fail can be treated as approximating a randomized experiment.  That is, there is a large element of luck in whether a bond barely passes or not, so by chance some schools get extra money and others do not.  If those that get that extra money by luck produce better student outcomes over time than those that don’t get the extra money by chance, then we can say that money — and not other factors — caused the change in outcomes.

The JPE study, like most of the other RD studies in this new literature, falls short of approximating a randomized experiment in two ways.  First, we can only view RD results as causal if the set of observations examined is sufficiently narrow that we can plausibly think that it is effectively chance whether the treatment is received or not.  But the JPE study defines bond referenda as “near the threshold” for passing if they are withing 20 percentage points of the percent required for passage of the referendum.  That is, if 50% is needed to pass a referendum, the JPE study would define the election as “near the threshold” if the bond received between 30% and 70% of the vote.  This bandwidth is so wide that it includes almost two-thirds of all bond referenda in the states they examine.  To call this “near the threshold” is misleading.  And it is simply implausible to think of any outcome between receiving 30% and 70% of the vote as a matter of luck.

Second, we can only view RD results as causal if actors have no control over whether they fall on one side or another of the threshold.  In the case of bond referenda that requirement is clearly violated.  Districts choose whether and when to hold a referendum and they do so based on their estimated likelihood of prevailing.  In addition, districts try to have a finger on the pulse of the campaign and can alter the effort by them and their allies to improve the chances of victory.  In sum, whether districts win or lose referenda is partially a function of their political competence and resources, which are qualities that the researchers cannot observe or control and yet are likely to be associated with changes in student outcomes over time.

The IV studies in this new literature are no better at approximating randomized experiments.  For IV research designs to produce causal results, they need to have an exogenous instrument — something that predicts whether schools get more money or not, but which is uncorrelated theoretically and empirically with later student outcomes.  While the details vary across study, the general approach of the IV studies in this literature is to treat court-ordered spending increases as exogneous.  That is, they have to believe that legislatively adopted spending increases, which past studies primarily relied upon, risk reverse causation, but court-ordered spending increases are fundamentally different.  Court-ordered spending has to be thought of as manna from heaven, dropping on schools as if at random.  At the very least we have to believe that court-ordered spending differs from the regularly legislative kind in that it has nothing to do with factors that contribute to improved student outcomes.

It is clear that court-ordered spending increases are not exogneous and are not fundamentally different from the regular legislative kind.  Courts are political actors, just like legislatures, and whether and when the courts order spending increases is at least partially a function of a broader political conviction in a state that more resources are available and should be devoted to schools.  That conviction is just as likely to be associated with future improvement in student outcomes if it is expressed by the courts as if it is expressed by the legislature.

Both RD and IV studies in this new literature attempt to justify that their efforts are causal with empirical claims about the similarity of treatment and control groups before spending is increased.  But they can only compare on observable qualities, which is precisely the same thing that prior observational studies do.  These studies need to be able to justify theoretically that their approaches approximate random assignment, but they cannot do so persuasively.  Whether bond referenda pass or fail, especially by large margins, is not random.  And whether and when courts order spending increases is also not random — at least no more or less so than when legislatures do it.

If these new RD and IV studies cannot persuasively argue that their approach approximates randomization, then their results are not more causal than the prior observational literature that showed no relationship between spending increases and improved student outcomes.  The promoters of this new school spending research are right to note the flaws of earlier studies, but they are insufficiently aware of flaws in the new research as well.

Given the causal weakness of both literatures, we should probably take a step back and see if either better conforms with our non-rigorous observation of the world.  As Rick Hanushek has noted, if the new research is right in its causal claims about more money improving outcomes, why have huge spending increases over decades not been associated with the kinds of improvements the “causal” research claims to find?

10 Responses to The New “Causal” Research on School Spending is Not Causal

  1. Greg Forster says:

    The absence of the phrase “junk science” from this post represents admirable restraint on your part, Jay. Holy cow – 20 percentage points!

    “The difference between a landslide election victory and a landslide election defeat is purely chance.”

  2. sstotsky says:

    https://www.educationnext.org/money-matters-after-all/ 7/17/2015, Eric Hanusek “It just says that the outcomes observed over the past half century – no matter how massaged – do not suggest that just throwing money at schools is likely to be a policy that solves the significant U.S. schooling problems seen in the levels and distribution of outcomes. We really cannot get around the necessity of focusing on how money is spent on schools.”

    Why don’t researchers or budget managers provide more refined categories instead of broad categories like personnel, facilities maintenance, etc.? At least move us closer to answers on how money is spent in our schools. https://urldefense.proofpoint.com/v2/url?u=http-3A__jaypgreene.com_2020_02_25_the-2Dnew-2Dcausal-2Dresearch-2Don-2Dschool-2Dspending-2Dis-2Dnot-2Dcausal_&d=DwMFaQ&c=7ypwAowFJ8v-mw8AB-SdSueVQgSDL4HiiSaLK01W8HA&r=bFeTUvEdv7Vp8-6XQBMcliPK2aZO7SPjo61r3aQbLLs&m=TAPCLU6OH4e4qtl41SSxZfWA036x2xU-3YWnaiBCogI&s=deCJtDDFDZufwv42UdIuKeqEwT2drYrNX25WfiKmQcE&e=

  3. […] Jay Greene’s analysis of a new Journal of Public Economics study of spending and student performance is a must-read.  Professor Greene writes, […]

  4. Mark Dynarski says:

    If the identifying assumptions of these new studies are not fully met, should we disregard their findings?

    If we do, we should disregard all the earlier studies, right? They were not causal. Which means we also have to acknowledge that the ‘money doesn’t matter’ finding is itself flawed–in fact more so than the findings from these newer studies, which at least have the benefit of starting from causal logic.

    The confusion here is that the new findings are partly causal. We don’t know how large the ‘partly’ is, though. It’s the same problem as a randomized experiment in which the randomization is done incorrectly for some participants. Measures of effects from the experiment are not strictly correct, but may be close to the true effects.

    • Greg Forster says:

      Well, I’m certainly open to the possibility of setting the threshold of scientific rigor so high that all existing empirical studies on spending and outcomes – along with all the studies on class sizes, etc etc etc – fail to meet it, leaving us to make policy on the basis of nothing but 1) the undeniable aggregate reality that we have spent several generations ballooning the total amount of money we give public schools, and have no visible improvement in outcomes whatsoever to show for it, and 2) a large body of unimpeachable, gold-standard empirical evidence showing that school choice improves academic outcomes.

      But I think it would be better to think about scientific rigor in terms of a continuum, with some studies better than others, and to recognize that the recent studies Jay reviews here are a *lot* further toward the bad end of the continuum than the studies finding that spending increases don’t improve outcomes.

    • I agree that earlier observational studies are also problematic and I noted how the new researchers have made a useful contribution by pointing that out. That being said, I don’t think it’s right to say that RD and IV designs that violate causal requirements are at least partially causal. It’s difficult to know the magnitude (and sometimes direction) of the bias when an IV has an endogenous instrument. Similarly it is hard to know the extent to which cases further from the threshold bias the functional form in an RD. And the same could be said of observational studies. I’m not saying I know what the true relationship is. I’m just saying that the new research is not causal as its advocates claim.

      • Mark Dynarski says:

        And I would say the new research is not *as* causal as its advocates claim. There is no sharp dividing line. If there were, it could be argued that a randomized trial with attrition (which in education is 100 percent of them) is not purely causal, because the attrition could be related to unobservables–can’t prove otherwise. The point is that once the assumption that the randomized treatment is independent of all other factors is not realized, we enter an empirical realm where degrees of attrition need to be considered.

        And so with bandwidth. A wider bandwidth admits more functional forms and a reader can decide that the bandwidth violates their sense of how wide it *should* be, but that’s more a taste thing than a technical argument. If the underlying assumptions of the RD design are met, the estimates are valid even with a wide bandwidth. It just becomes harder to maintain that the assumptions are met.

        As for IV studies, I generally don’t find instruments credible except for treatment assignment as an IV to estimate a TOT effect. That’s a taste thing too, but it always prove a bit too easy to argue that some chosen instrument is correlated with the outcome through one or more channels.

      • I agree. I am making a theoretical objection, not a technical one, akin to an RCT where the attrition is too high. At some point the attrition is so high that we just can’t make the leap of faith that the results are really causal, even if it meets empirical tests like balance on baseline covariates. Similarly, an RD with such a wide bandwidth no longer seems to be plausible as causal, even if it meets empirical tests. And an IV with such an endogenous IV no longer plausible as causal. So this is not a matter of degree of causal. At some point the theoretical requirements for thinking of these approaches as causal are violated enough that we can just don’t believe them. Of course, the same is true of observational analyses, like matching. Even if it meets the empirical test of balance at baseline, if we theoretically believe there is selection, then we just don’t believe the results are causal.

        This is a theoretical argument, not an empirical one. And I have not seen anyone on Twitter engage with this issue at all.

  5. Mike G says:

    This was an informative Jay blog and Mark-Jay exchange. Thx.

Leave a Reply to Greene: The New “Causal” Research on School Spending is Not Causal - The Locker Room - The Locker Room Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s