There has been a flurry of research recently claiming to find compelling causal evidence that increasing school spending would significantly improve student outcomes and avoiding cuts in spending would prevent significant harm. This research has been embraced so quickly as settled fact that over 400 researchers and advocates signed a group letter citing it while urging the federal government to provide financial support to local schools during the COVID recession. The confident conclusion that spending more is the path to improving education is so appealing that the research behind that claim has received remarkably little scrutiny.
A new study by Jessica Goldstein and Josh McGee begins to remedy this lack of skepticism by carefully attempting to replicate the most recent school finance study co-authored by Kirabo Jackson with Cora Wigger and Heyu Xiong, which is forthcoming in American Economic Journal: Economic Policy and has appeared in Education Next. Jackson, Wigger, and Xiong examine the effect of K-12 spending cuts during the Great Recession by comparing the downturn in states where much of the funding comes from state revenue to states where more funding comes from local sources. The idea is that state revenue is more sensitive to a recession, and so cuts would be more severe in states that were more reliant on state sources, even when the effects of the recession on the state’s economy were the same. Using this technique, they conclude that K-12 spending cuts hurt student outcomes.
Goldstein and McGee are able to reconstruct what Jackson, Wigger, and Xiong report, but they find that their results are highly sensitive to the non-standard ways in which they construct their statistical model and disappear or even change direction when trivial changes are made. Goldstein and McGee also highlight some serious problems with the data used in the original study.
Because these may sound like minor technical disputes, let me describe some of the issues in non-technical language so that readers can more easily grasp how much this replication effort undermines confidence in the original claims. As Goldstein and McGee put it, “Econometric models can be constructed in a variety of ways, and many modeling choices may be somewhat arbitrary or theoretically unimportant. However, if the model’s estimates represent the true causal impact, they should be consistent across many different reasonable ways of constructing the model.” The replication effort convincingly demonstrates that the original results claiming significant harms from spending cuts are not robust to these kinds of changes. Of the many theoretically reasonable ways the original study could have constructed their model, its authors managed to find one that would yield significant positive results out of the many that would have yielded null results.
To compare states that are highly reliant on state revenue for K-12 spending to those that rely much less, Jackson, Wigger, and Xiong divide the 50 states and DC into three groups: those with more than 67% of K-12 spending coming from state sources, those with less than 33 percent coming from state sources, and all others in the middle. Dividing states in this way places only four states in the high-reliance group and three in the low reliance group, with the remaining 44 states in the middle. The main results they present are based on the difference in outcomes between the top four and bottom three states. This thin slice of states contains the two strange cases of DC and Hawaii, both of which only have a single school district and where state versus local revenue is not at all meaningful. Goldstein and McGee try changing the thresholds for states being classified into the high and low categories to see if the results remain the same if they compare top versus bottom quartiles or deciles of states. The exact grouping of states into high and low categories should not make much of a difference, but the replication shows that researchers would get null results if they had tried these reasonable alternative ways of categorizing states.
Similarly, the original study recognized that it is important to separate the effects of spending cuts in certain states from peculiar changes attributable to the time periods for all states. Ideally, they would introduce a dummy variable for each year, which they say they tried but it yielded insignificant results. Instead, they choose to group years into pre-recession, recession, and post-recession periods to control for idiosyncratic effects of changes over time. The years that they label as pre, during, and post-recession, however, are not consistent with the official designation of the recession by the National Bureau of Economic Research. So, the replication makes slight adjustments in how years are categorized and discover that doing so yields null results, sometimes with negative estimated effects of spending on student outcomes. Again, real results should not disappear when these kinds of trivial changes are made.
The replication also considers the original study’s claim that spending cuts reduce college-going in the year following the spending change. The theoretical mechanism by which this effect is produced is unclear given that college-going is likely the result of more than a decade of educational investment, not just the previous year’s spending. Goldstein and McGee offer an alternative pathway by which college-going might be reduced, which is state expenditures on higher education. As it turns out, states that rely heavily on state revenue for K-12 spending are also places where higher education relies heavily on state spending. When those states cut K-12 spending during the Great Recession, they also cut higher education funding. The replication substitutes higher education for K-12 spending in the original model, which yields similar effects on college-going rates. This clearly demonstrates that the original study had not isolated the causal effect of K-12 spending cuts from the similar effects of higher education reductions.
Lastly, the replication reveals several problems with the data used in the original study. For example, the original study reports Vermont as having 68.3% of K-12 spending coming from state revenue while the Census, the data source they cite, indicates that figure should be 88.5%. Similarly, Arkansas’ state share of spending is listed as 75.7%, which is consistent with the Census figure, but is almost 20 percentage points different from the number provided by the National Center for Education Statistics. It is not obvious which is the better figure to use and these disparities reveal that identifying the state share of K-12 spending, on which the entire analysis depends, is problematic. Most alarmingly, the results Jackson, Wigger, and Xiong produce in Figure 3 of their Education Next article claiming to show the effects of comparing results for states above and below the national median of reliance on state revenue could not be replicated by Goldstein and McGee (see Figures 7-9) and are almost certainly in error. Done correctly Figure 3 should show no effects on student outcomes from spending cuts.
It is disconcerting that neither the reviewers at AEJ-Policy or the over 400 researchers who signed the group letter were able to detect these data problems or raise questions about the unusual way in which states and years were grouped in the original study. Before quickly embracing desired findings, the field needs to restore the traditional scientific virtues of skepticism and humility and apply those more generally to the new research on school spending effects.
I was one of the signers of the letter. The main thrust of the letter was to advocate for states and school districts to get stimulus resources, not to endorse Jackson’s findings. The letter mentions Jackson’s research in a single sentence and provides a link to his paper surveying findings from 40 or so studies. Some showed positive findings and some did not.
It was always a bit of an odd puzzle that schools would be the only context in which resources didn’t matter. (One could say the whole discipline of economics is based on resources mattering.) But as Rick Hanushek has cautioned, researchers need to make the case, not take it for granted. And most school resources studies are simply too confounded to yield sharp causal conclusions, the kind we expect from true experiments.
In this context, the two studies are using 450 observations and applying instrumental variable methods, which have desirable properties only asymptotically. That is really pushing it, and I am not surprised that the Goldstein-McGhee replication finds a lack of robustness to changing model specifications. But findings even from some of their models showed that resources matter, so readers are left to their own conclusions.
Researchers are parodied because they always conclude that more research is needed. But on the school resources question, they are right.
Far from it being a surprise or a theoretical puzzle that increased spending doesn’t necessarily make government programs more effective in pursuing their stated (as opposed to their real) goals, this has been a common observation and an extensive subject of theoretical reflection in the scholarly literature for generations. There are of course a lot of debates to be had about public choice theory and so on, and many different views about the question are held by scholars. But if you aren’t even aware of the existence of basic theoretical issues that have been at the center of the social-science discourse about these questions since at least the 1950s, you might consider heeding Jay’s much-needed call for a return to the traditional scholarly values of skepticism and humility.
Also, whether government schools perform better when we pour more money into the bureaucracy that runs them is a distinct question from whether school choice policies cause schools to perform better. Both can be true!
Jay, as I said last time, in your treatment of the flagrant flaws in this research you are the model of self-restraint. “Let’s put the states into three categories, with 44 states falling in one of the categories, and then compare them – and let’s not do robustness checks that might show that our results are an artifact of our arbitrary classification system!” Not to mention including Hawaii and DC in a comparison of state versus local funding when they each have only one school district.
Time to review this golden oldie.
I wonder why we have not heard about the savings made by schools after the lockdown. Electricity costs (heating/cooling) are only one possibility. Don’t school board members know about turning thermometers down? Sandra Stotsky
HAS ANYONE FIGURED OUT SCHOOL SAVINGS AFTER A LOCKDOWN. ESPECIALLY ELECTRICITY COSTS.
Recent Interview with Duke Pesta
[…] being told that the case is closed that higher funding produces better outcomes. This is disputed concerning specific, often tightly focused studies, and these scores reinforce doubts on the macro […]