There has been a flurry of research recently claiming to find compelling causal evidence that increasing school spending would significantly improve student outcomes and avoiding cuts in spending would prevent significant harm. This research has been embraced so quickly as settled fact that over 400 researchers and advocates signed a group letter citing it while urging the federal government to provide financial support to local schools during the COVID recession. The confident conclusion that spending more is the path to improving education is so appealing that the research behind that claim has received remarkably little scrutiny.
A new study by Jessica Goldstein and Josh McGee begins to remedy this lack of skepticism by carefully attempting to replicate the most recent school finance study co-authored by Kirabo Jackson with Cora Wigger and Heyu Xiong, which is forthcoming in American Economic Journal: Economic Policy and has appeared in Education Next. Jackson, Wigger, and Xiong examine the effect of K-12 spending cuts during the Great Recession by comparing the downturn in states where much of the funding comes from state revenue to states where more funding comes from local sources. The idea is that state revenue is more sensitive to a recession, and so cuts would be more severe in states that were more reliant on state sources, even when the effects of the recession on the state’s economy were the same. Using this technique, they conclude that K-12 spending cuts hurt student outcomes.
Goldstein and McGee are able to reconstruct what Jackson, Wigger, and Xiong report, but they find that their results are highly sensitive to the non-standard ways in which they construct their statistical model and disappear or even change direction when trivial changes are made. Goldstein and McGee also highlight some serious problems with the data used in the original study.
Because these may sound like minor technical disputes, let me describe some of the issues in non-technical language so that readers can more easily grasp how much this replication effort undermines confidence in the original claims. As Goldstein and McGee put it, “Econometric models can be constructed in a variety of ways, and many modeling choices may be somewhat arbitrary or theoretically unimportant. However, if the model’s estimates represent the true causal impact, they should be consistent across many different reasonable ways of constructing the model.” The replication effort convincingly demonstrates that the original results claiming significant harms from spending cuts are not robust to these kinds of changes. Of the many theoretically reasonable ways the original study could have constructed their model, its authors managed to find one that would yield significant positive results out of the many that would have yielded null results.
To compare states that are highly reliant on state revenue for K-12 spending to those that rely much less, Jackson, Wigger, and Xiong divide the 50 states and DC into three groups: those with more than 67% of K-12 spending coming from state sources, those with less than 33 percent coming from state sources, and all others in the middle. Dividing states in this way places only four states in the high-reliance group and three in the low reliance group, with the remaining 44 states in the middle. The main results they present are based on the difference in outcomes between the top four and bottom three states. This thin slice of states contains the two strange cases of DC and Hawaii, both of which only have a single school district and where state versus local revenue is not at all meaningful. Goldstein and McGee try changing the thresholds for states being classified into the high and low categories to see if the results remain the same if they compare top versus bottom quartiles or deciles of states. The exact grouping of states into high and low categories should not make much of a difference, but the replication shows that researchers would get null results if they had tried these reasonable alternative ways of categorizing states.
Similarly, the original study recognized that it is important to separate the effects of spending cuts in certain states from peculiar changes attributable to the time periods for all states. Ideally, they would introduce a dummy variable for each year, which they say they tried but it yielded insignificant results. Instead, they choose to group years into pre-recession, recession, and post-recession periods to control for idiosyncratic effects of changes over time. The years that they label as pre, during, and post-recession, however, are not consistent with the official designation of the recession by the National Bureau of Economic Research. So, the replication makes slight adjustments in how years are categorized and discover that doing so yields null results, sometimes with negative estimated effects of spending on student outcomes. Again, real results should not disappear when these kinds of trivial changes are made.
The replication also considers the original study’s claim that spending cuts reduce college-going in the year following the spending change. The theoretical mechanism by which this effect is produced is unclear given that college-going is likely the result of more than a decade of educational investment, not just the previous year’s spending. Goldstein and McGee offer an alternative pathway by which college-going might be reduced, which is state expenditures on higher education. As it turns out, states that rely heavily on state revenue for K-12 spending are also places where higher education relies heavily on state spending. When those states cut K-12 spending during the Great Recession, they also cut higher education funding. The replication substitutes higher education for K-12 spending in the original model, which yields similar effects on college-going rates. This clearly demonstrates that the original study had not isolated the causal effect of K-12 spending cuts from the similar effects of higher education reductions.
Lastly, the replication reveals several problems with the data used in the original study. For example, the original study reports Vermont as having 68.3% of K-12 spending coming from state revenue while the Census, the data source they cite, indicates that figure should be 88.5%. Similarly, Arkansas’ state share of spending is listed as 75.7%, which is consistent with the Census figure, but is almost 20 percentage points different from the number provided by the National Center for Education Statistics. It is not obvious which is the better figure to use and these disparities reveal that identifying the state share of K-12 spending, on which the entire analysis depends, is problematic. Most alarmingly, the results Jackson, Wigger, and Xiong produce in Figure 3 of their Education Next article claiming to show the effects of comparing results for states above and below the national median of reliance on state revenue could not be replicated by Goldstein and McGee (see Figures 7-9) and are almost certainly in error. Done correctly Figure 3 should show no effects on student outcomes from spending cuts.
It is disconcerting that neither the reviewers at AEJ-Policy or the over 400 researchers who signed the group letter were able to detect these data problems or raise questions about the unusual way in which states and years were grouped in the original study. Before quickly embracing desired findings, the field needs to restore the traditional scientific virtues of skepticism and humility and apply those more generally to the new research on school spending effects.