School Choice Boosts Test Scores

May 10, 2016

(Guest post by Patrick J. Wolf)

Private school choice remains a controversial education reform.  Choice programs, involving school vouchers, tax-credit scholarships, or Education Savings Accounts (ESAs), provide financial support to families who wish to access private schooling for their child.  Once declared dead in the U.S. by professional commentators such as Diane Ravitch and Greg Anrig, there are now 50 private school choice programs in 26 states plus the District of Columbia.  Well over half of the initiatives have been enacted in the past five years.  Private school choice is all the rage.

But does it work?  M. Danish Shakeel, Kaitlin Anderson, and I just released a meta-analysis of 19 “gold standard” experimental evaluations of the test-score effects of private school choice programs around the world.  The sum of the reliable evidence indicates that, on average, private school choice increases the reading scores of choice users by about 0.27 standard deviations and their math scores by 0.15 standard deviations.  These are highly statistically significant, educationally meaningful achievement gains of several months of additional learning from school choice.  The achievement benefits of private school choice appear to be somewhat larger for programs in developing countries than for those in the U.S.  Publicly-funded programs produce larger test-score gains than privately-funded ones.

The clarity of the results from our statistical meta-analysis contrasts with the fog of dispute that often surrounds discussions of the effectiveness of private school choice.  Why does our summing of the evidence identify school choice as a clear success while others have claimed that it is a failure (see here and here)?  Three factors have contributed to the muddled view regarding the effectiveness of school choice:  ideology, the limitations of individual studies, and flawed prior reviews of the evidence.

School choice programs support parents who want access to private schooling for their child.  Some people are ideologically opposed to such programs, regardless of the effects of school choice.  Other people have a vested interest in the public school system and resist the competition for students and funds that comes with private school choice.  No amount of evidence is going to change their opinion that school choice is bad.

A second source of disputes over the effectiveness of choice are the limits of each individual empirical study of school choice.  Some are non-experimental and can’t entirely rule out selection bias as a factor in their results (see here, and here).  Fortunately, over the past 20 years, some education researchers have been able to use experimental methods to evaluate privately- and publicly-funded private school choice programs.  Experimental evaluations take the complete population of students who are eligible for a choice program and motivated to use it, then employ a lottery to randomly assign some students to receive a school-choice voucher or scholarship and the rest to serve in the experimental control group.  Since only random chance, and not parental motivation, determines who gets private school choice and who doesn’t, gold standard experimental evaluations produce the most reliable evidence regarding the effectiveness of choice programs.  We limit our meta-analysis to the 19 gold standard studies of private school choice programs globally.

Each of the gold standard studies, in isolation, has certain limitations.  In the experimental evaluation of the initial DC Opportunity Scholarship Program that I led from 2004 to 2011, the number of students in testing grades dropped substantially from year 3 to year 4, leading to a much noisier estimate of the reading impacts of the program, which were positive but just missed being statistically significant with 95% confidence.  Two experimental studies of the Charlotte privately-funded scholarship program, here and here, reported clear positive effects on student test scores but were limited to just a single year after random assignment.  Two recent experimental evaluations of the Louisiana Scholarship Program found negative effects of the program on student test scores but one study was limited to just a single year of outcome data and the second one (which I am leading) has only analyzed two years of outcome data so far.  The Louisiana program, and the state itself, are unique in certain ways, as are many of the programs and locations that have been evaluated.  What are we to conclude from any of these individual studies?

Meta-analysis is an ideal approach to identifying the common effect of a policy when many rigorous but small and particular empirical studies vary in their individual conclusions.  It is a systematic and scientific way to summarize what we know about the effectiveness of a program like private school choice.  The sum of the evidence points to positive achievement effects of choice.

Finally, most of the previous reviews of the evidence on school choice have generated more fog than light, mainly because they have been arbitrary or incomplete in their selection of studies to review.  The most commonly cited school choice review, by economists Cecilia Rouse and Lisa Barrow, declares that it will focus on the evidence from existing experimental studies but then leaves out four such studies (three of which reported positive choice effects) and includes one study that was non-experimental (and found no significant effect of choice).  A more recent summary, by Epple, Romano, and Urquiola, selectively included only 48% of the empirical private school choice studies available in the research literature.  Greg Forster’s Win-Win report from 2013 is a welcome exception and gets the award for the school choice review closest to covering all of the studies that fit his inclusion criteria – 93.3%.  (Greg for the win!)

Our meta-analysis avoided all three factors that have muddied the waters on the test-score effects of private school choice.  It is a non-ideological scientific enterprise, as we followed strict meta-analytic principles such as including every experimental evaluation of choice produced to date, anywhere in the world.  Our study was accepted for presentation at competitive scientific conferences including those of the Society for Research on Education Effectiveness, the Association for Education Finance and Policy, and the Association for Policy Analysis and Management.  Our study is not limited by small sample sizes or only a few years of outcome data.  It is informed by all the evidence from all the gold standard studies.  Finally, there is nothing arbitrary or selective in our sample of experimental evaluations.  We included all of them, regardless of their findings.  When you do the math, students achieve more when they have access to private school choice.


Paul Peterson: Expanding Choice is Best Hope for Ed Reform

May 10, 2016

In a sweeping and persuasive review of the past two decades of education reform, Paul Peterson observes that top-down efforts, like standards, testing, and accountability, have run out of steam educationally and politically.  The best way forward, he argues, is to continue working for the steady expansion of choice and competition.

Here are some highlights:

Vouchers and tax credits are slowly broadening their legal footing. Charter schools are growing in number, improving in quality, and beginning to pose genuine competition to public schools, especially within big cities. Introducing such competition is the best hope for American schools, because today’s public schools are showing little capacity to improve on their own.

And on the failure of top-down reforms:

Admittedly, regulatory reform was not invented in Washington. Calls for higher standards, minimum competency tests, and school accountability had surfaced at the state level as early as the 1970s. Southern governors—James Hunt in North Carolina, Bill Clinton in Arkansas, Jeb Bush in Florida, Ann Richardson in Texas, and others—played major roles. Outside the South, Massachusetts took the lead….

All these steps required a vast number of regulations. But school districts still found ways of undermining federal objectives. They instituted byzantine procedures that parents had to navigate before they could exercise choice. Afterschool programs offered by private providers were frequently denied space at local schools. Reconstitution of low-performing schools often consisted mostly of window dressing.

Leading him to conclude:

As an education reform strategy, federal regulation is dead. The regulations had little long-term effect, and the political opposition crescendoed. The regulated captured the regulators. Nor is there much appetite for new accountability rules at the state level. If reform is to take place as the rest of the 21st century unfolds, it will happen because more competition is being introduced into the American education system.

Be sure to read the entire piece in Education Next.


Test Score Gains Predict Long-Term Outcomes, So We Shouldn’t Be Too Shy About Using Them

May 10, 2016

Editor’s note: This post is the sixth and final entry in an ongoing discussion between Fordham’s Michael Petrilli and the University of Arkansas’s Jay Greene that seeks to answer this question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with test results? Prior entries can be found herehereherehere, and here.

Shoot, Jay, maybe I should have quit while we were ahead—or at least while we were closer to rapprochement.

Let me admit to being perplexed by your latest post, which has an Alice in Wonderland aspect to it—a suggestion that down is up and up is down. “Short-term changes in test scores are not very good predictors of success,” you write. But that’s not at all what the research I’ve pointed to shows.

Start with the David Deming study of Texas’s 1990s-era accountability system. Low-performing Lone Star State schools faced low ratings and responded by doing something to boost the achievement of their low-performing students. That yielded short-term test-score gains, which were related to positive long-term outcomes. This is the sort of thing we’d like to see much more of in education, wouldn’t we?

Yet you focus on the negative finding: that higher-performing Texas schools were more likely to keep their low-performing kids from taking the test, and those kids did worse over the long term. Supposing that’s so, it merely indicates a flaw in Texas’s school accountability system, which should have required schools to test virtually all of their students (as No Child Left Behind later did). The reason that this group of low-performers did worse is most likely because their schools failed even to try to raise achievement. If they had, those kids’ long-term outcomes would have likely been better too.

As for your points on test score “fade-out,” you are right in that we see this phenomenon in both the pre-K studies you mentioned and in Project Star. Why it happens is an interesting question for which nobody has a great answer, as far as I know, other than the obvious point that the schools and classrooms those kids enter into don’t know how (or don’t try very hard) to sustain earlier gains. But it doesn’t really matter. For our purposes, what it shows is that short-term test score gains don’t lead to long-term test score gains, but they do lead to long-term success. Which is the Holy Grail!

Let’s take it out of the abstract. Let’s say we want to evaluate preschools on whether their students make progress on cognitive assessments, or judge elementary schools based on student-level gains during grades K–3. The evidence indicates that preschools or elementary schools that knock it out of the park in terms of test score gains will see those impacts fade over time, as gauged by test scores. But the kids enrolled in those preschools and elementary schools will benefit in the long term. Whatever the schools are doing to raise short-term test scores is also helping lead to later success; we can measure the scores, but we can’t measure the other stuff. But remind me again: Why we wouldn’t want to use short-term test scores as one gauge of school or program quality?

You end much as you begin:

Rather than relying on test results anyway and making potentially disastrous decisions to close schools or shutter programs on bad information, we should recognize that local actors—including parents—are in a better position to judge school quality. Their preferences deserve strong deference from more distant authorities.

And as I’ve written previously, we’re of one mind in being “anti-bad-information.” We should absolutely stop using performance levels alone (i.e., proficiency rates) to judge school quality. We should be concerned about accountability systems or authorizing practices that might encourage counterproductive practices—like excluding kids from testing or focusing narrowly on reading and math skills instead of a broad curriculum. And we also agree that parents deserve much deference.

But I don’t agree that short-term achievement gains should be put in the “bad information” bucket. And I think you’re being a tad naïve about the quality of “information” that parents themselves have about their schools, which is often extremely limited or hard to interpret. Most parents (myself included) have only a hazy picture of “school quality” and how to know whether it’s present at our own kids’ schools. You know if your child is happy, if the teacher is welcoming, and if the place is safe. It’s a lot harder to know how much learning is taking place, especially in an age when grade inflation is rampant. (Why else would 90 percent of the nation’s parents think that their own children were on grade level?) The government has a role to play in making sure that all school choices meet a basic threshold for quality, just as it has a role in making sure that all of our choices at the grocery store are safe.

So I return to my proposition: Let’s not make high-stakes decisions about schools or programs based on test scores alone. But let’s not ignore those scores, either, or trivialize their influence to such an extent that we allow persistently low-performing schools to persist, zombie-like, in perpetuity.

Charter school authorizers and other quality monitors should react swiftly when schools post mediocre or worse value-added scores. They should give those schools—and their parents—the chance to demonstrate their quality through other means. They should do what they can to turn failure into success, hard though that is. But for the good of the kids, the public, and the sector, they shouldn’t hesitate to shutter schools that aren’t helping children progress.

***

And with that, let me thank Jay for a great debate. We may not agree on what test scores can tell us, but I’m heartened that we concur that there are times when officials must act to address low performance. Parental choice is necessary, but not sufficient. Q.E.D.

– Mike Petrilli

This first appeared on Flypaper.


Populist Policies will Harm the Poor

May 9, 2016

(Guest Post by Matthew Ladner)

After the 1929 stock market crash the American public became firmly convinced that the government had to do something: “Action and Action Now!” as Franklin Roosevelt put it. In reality, the government had already taken far too much action and deepened the crisis. Congress foolishly passed and President Hoover signed an act that instituted a global trade war. The Federal Reserve tightened credit during the downturn (the Fed had even less of an idea about what it was doing in those days) in a move reminiscent of medieval leech medical practice. Anxious to do their part, the New Dealers flailed about chaotically creating a never-ending series of agencies dedicated to the proposition that the American government knew how to order our affairs better than the American people.  The United States had plenty of stock market crashes and recessions before 1929, but previous downturns tended to be of a short-lived variety. It took a parade of highly empowered fools to create and sustain the Great Depression. To the credit of the New Dealers, they at least recognized and corrected Hoover’s folly in creating a system for liberalized global trade after World War II- a system that helped generate and sustain growing global prosperity.

Angry populists of today should take the time to study this sad history. “I’m from Washington, and I’m here to help!” was once understood as the punchline to a joke due to the unfortunate tendency for government action to backfire. Efforts on the left to raise the minimum wage to $15 for instance will doubtlessly accelerate a substitution of technology for routine labor. The inexperienced and unskilled will suffer most. We’ve already for instance seen the advent of an automated hamburger joint:

Great for a relatively small group automation experts, not so much for a huge group of 16 year old kids looking for work.

Likewise let’s consider two scenarios for the United States slapping 45% tariffs on foreign goods. In the first scenario, other countries recognize our greatness and beg for forgiveness, submitting to whatever demands we care to make. In the second scenario, impacted nations retaliate with tariffs against American goods sold in their markets, costing the United States jobs. We reverse the economic benefits of comparative advantage and specialization to indulge in another idiotic trade war that the World War II generation wisely swore off. Whatever net manufacturing employment gained represents a direct transfer from the pockets of consumers, who now must pay higher prices for goods. Needless to say, those with the least amount of money suffer the most from increased prices.

The first scenario seems fantastically unlikely, the second almost certain. H.L. Mencken stated that “Democracy is the theory that the common people know what they want, and deserve to get it good and hard.” Let’s be careful what we wish for- empowering a new generation of economically illiterates will end in disaster, especially for the poor.

 

 

 

 

 

 

 

 


Regulators need to use test scores with great care

May 6, 2016

Editor’s note: This post is the fifth in an ongoing discussion between Fordham’s Michael Petrilli and the University of Arkansas’s Jay Greene that seeks to answer this question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with test results? Prior entries can be found hereherehere, and here.

Mike, you say that we agree on the limitations of using test results for judging school quality, but I’m not sure how true that is. In order not to get too bogged down in the details of that question, I’ll try to keep this reply as brief as possible.

First, the evidence you’re citing actually supports the opposite of what you are arguing. You mention the Project Star study showing that test scores in kindergarten correlated with later life outcomes as proof that test scores are reliable indicators of school or program quality. But you don’t emphasize an important point: Whatever benefits students experienced in kindergarten that resulted in higher test scores, they did not cause higher test scores in later grades—even though they produced better later-life outcomes. As they put it, “The effects of class quality fade out on test scores in later grades, but gains in non-cognitive measures persist.” This is an example of the disconnect between test scores and life outcomes, which is exactly what I’ve been arguing. If we used test scores as a proxy for school or program quality, we would wrongly conclude that this program did not help, since the test score gains faded even though the benefits endured.

You also draw the wrong conclusion from the Deming, et al. article. The authors did find that test score gains for lower-scoring students in lower-performing schools resulted in higher earnings for those students. But lower-scoring students in higher-performing schools experienced an even larger decline in later-life earnings. These results highlight two things. First, narrowly focusing on raising test scores helps some low-scoring students. But it harms other low-scoring students such that:

This negative impact on earnings is larger, in absolute terms, than the positive earnings impact in schools at risk of being rated Low-Performing. However, there are fewer low-scoring students in high-scoring schools, so the overall effects on low-scoring students roughly cancel one another other out. Again, we find no impact of accountability pressure on higher-achieving students.

Having no net effect on low-scoring students, as well as having no effect of any kind on higher-scoring students, does not sound like a ringing endorsement of using accountability pressure to focus narrowly on test scores.

Second, the pattern of results in that paper supports my argument about the disconnect between test score gain and changes in later-life outcomes. Low-scoring students in higher-performing schools only experienced a decline of 0.4 percent in the probability of passing the tenth-grade math exam, but they exhibited a decline in annual earnings of $748 at age twenty-five. The low-scoring students in low-performing schools experienced a much larger 4.7 percent increase in the probability of passing the tenth-grade math exam, but they only exhibited an increase of $298 in earnings at age twenty-five. A negligible drop in test scores was associated with a large decline in earnings, while a large increase in test performance resulted in a more modest gain in earnings. See the disconnect?

You’re also mistaken in your belief that the evidence of this disconnect is confined to high schools. There is a fairly large literature on early education that shows little or no enduring test score gains from preschool but large benefits later in life. Again, gains in test scores do not appear to capture very well the quality of schools or programs. In addition, a series of studies by David Grissmer and colleagues found that early math and reading achievement tests are not even very good predictors of later test results relative to other types of skills and more general knowledge. They conclude: “Paradoxically, higher long-term achievement in math and reading may require reduced direct emphasis on math and reading and more time and stronger curricula outside math and reading.”

I could go on, but I promised to be brief. The overall point is that if tests were reliable indicators of school and program quality, they should consistently be predictive of later-life outcomes. As this brief review of research demonstrates, it is quite common for test score results not to be predictive of later-life outcomes. If even rigorous research fails to show a consistent relationship between test scores and later success, why would we think that regulators and policy makers with less rigorous approaches to test scores could use them to reliably identify school and program quality? Rather than relying on test results anyway and making potentially disastrous decisions to close schools or shutter programs on bad information, we should recognize that local actors—including parents—are in a better position to judge school quality. Their preferences deserve strong deference from more distant authorities.


Rely on local actors, instead of faulty information, to make judgements about school quality

May 4, 2016

Editor’s note: This post is the third in an ongoing discussion between Fordham’s Michael Petrilli and the University of Arkansas’s Jay Greene that seeks to answer this question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with test results? Prior entries can be found here [by Jay] and here [by Mike].

Jay P. Greene:

It’s always nice to find areas of agreement, but I want to be sure that we really do agree as much as you suggest, Mike. I emphasized that it should take “a lot more than ‘bad’ test scores” to justify overriding parental preferences. You say that you agree. But at the end, you add that we may have no choice but to rely primarily on test scores to close schools and shutter programs—or else “succumb to ‘analysis paralysis’ and do nothing.”

This is a false dichotomy. If all we have are unreliable test scores, we don’t have to make decisions based on them or “do nothing.” Instead, we could rely on local actors who have more contextual knowledge about school or program quality. So if the charter board, local authorizer, and parents think a school is doing a good job even if test scores look “bad,” we should defer to them. That isn’t doing nothing; it’s relying on those who know more than can be gleaned from test scores. And quite often, those more knowledgeable local actors will be parents, which is why I think we should show strong deference to parental preferences. We don’t have to substitute uninformed decisions by distant regulators for those of more knowledgeable parents.

The danger with your argument—that we may have no choice but to rely on test scores—is that it rationalizes ignorant actions by policy makers whose knowledge of school or program quality consists almost entirely of test score results. Even worse, they almost always rely on levels of test results rather than gains. It’s important to emphasize how crude and inaccurate decisions based on test scores typically are, rather than to imagine them to be as sophisticated as analyses found in leading journals (which are still quite imperfect). Using only levels of test scores, regulators and policy makers are quite content to label schools serving highly disadvantaged populations as “bad.” The perverse result is that those schools trying to serve needy populations, or those that do not focus narrowly on math and reading test scores, are likely to be punished or closed.

I’m glad we agree that “it should take a lot more than ‘bad’ test scores” to “close a school or shutter a program in the face of parental demand.” And I concur that we may never be able to develop other reliable indicators of school quality to be used by distant regulators or policy makers, including measures of character skills like grit and conscientiousness. But if we’re unable to develop strong measures of school quality that can be used remotely, the logical conclusion to be drawn is not that we ought to rely on them anyway. Instead, we should rely on the judgments of those closer to the situation, including parents, who have better information about school quality.

I accept that this will sometimes mean closing schools or programs that some parents nevertheless want. But I believe that few schools with long waiting lists will also be poorly graded by local actors using their broad contextual knowledge. Of all charter schools closed by local authorizers or their own boards, the vast majority had financial problems—meaning that they generally suffered from a lack of parental demand. It will be a rarity for parental assessments of quality to be at odds with those of local authorizers making decisions based on a lot more than test scores.

So I hope that we really agree that math and reading test results are not strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded. This means that we should instead accept the judgments of those with much more information about school quality, and it will be extremely rare that these more informed assessments of quality will be at odds with parental preferences.

(Also posted at Flypaper)


Local Control and Equity Do Not Mix

May 3, 2016

(Guest Post by James Shuls)

Some things just go together–cookies and milk, surf and turf, the Three Stooges (Moe, Larry, and Curly, of course)–you get the picture.  At the same time, there are things that simply do not mix.  When it comes to public education, two fundamental ideals fit this bill–local control and equity.  Try as we might, we simply haven’t figured out, in our current public education system, how to deliver on both of these principles.

Recently, NPR launched an initiative focused on the latter, the “School Money” project.   Reporters for the news outlet asked the question, why do some schools have so much, while others have so little?  They summed it all up very succinctly:

Two words: property tax

The NPR reporters go on to say:

The problem with a school-funding system that relies so heavily on local property taxes is straightforward: Property values vary a lot from neighborhood to neighborhood, district to district. And with them, tax revenues.”

There is just one problem with this answer–it’s wrong.

Imagine if we replaced the local property tax with a different local tax; it could be a local sales tax, a local income tax, whatever. No matter what we tax, as long as we collect it locally, independent school districts will generate different amounts of money. It is not the property tax that causes inequities. It is our very system of public education itself; it is the local school district that causes inequities.

More specifically, it is the combination of local school districts and local support for public schools which causes differences in school spending.  Local school districts use the power of taxation to build new schools, to increase teacher pay, and to provide services for students. Interestingly, when given the opportunity, many local school districts tax themselves above and beyond any amount required by the state.

Here is the real kicker – the rich tend to tax themselves more.  In my home state of Missouri, for example, the 50 highest spending districts have a tax rate ceiling for operating funds of $4.582 per $100 of assessed valuation. The 50 lowest spending districts tax themselves at just $3.029 per $100 of assessed valuation. These districts not only have lower property values, on average, they also choose to tax themselves less.

In other words, when we allow property rich school districts to tax themselves more and property poor school districts to tax themselves less, we allow taxpayers to willfully contribute to the inequities we see between districts.

Many people, including the authors of the NPR reports, point to the gaps in spending and take a reductionist approach to this complex system. It’s the property tax! As much as I enjoy discussions about school finance, the issues here are much more fundamental —some people willingly choose to invest more in their children’s education than others.  These people appear to sort into communities with like-minded people.

It is easy to ask the question, “Why do some schools spend more than others?” It is much harder to answer the question, “Should we allow local taxpayers to have some say in how much they will support their local schools?” The former can be answered objectively, the latter cannot. If you say, “No,” you are saying you are uncomfortable with local control. If you answer, “Yes,” then you support some level of inequity.  In the current system, we can’t have both local control and equity.

If at the end of NPR’s “School Money” project we have only answered the easy questions, we will be no better off.  We will simply continue to wrestle with the same issues that we have grappled with for decades.  We will continually struggle to reconcile two incompatible ideals – local control and equity.

If, however, we start to think about how the fundamental organization of our school system—a patchwork of 14,000 school districts with geographic monopolies over the residents who live within them—contributes both to spending and educational inequities and think about how we can reform that, we might be able to move the discussion forward.

———————————————————

James V. Shuls, Ph.D., is an assistant professor of educational leadership and policy studies at the University of Missouri – St. Louis and a distinguished fellow of education policy at the Show-Me Institute.


Shut bad schools for low performance, but don’t draw conclusions from test scores alone

May 3, 2016

(Guest Post by Michael J. Petrilli)

Editor’s note: This post is the second in an ongoing discussion between Fordham’s Michael Petrilli and the University of Arkansas’s Jay Greene that seeks to answer this question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with test results? The first entry [by Jay] can be found here.

The prompt for this forum promised that we would explore “areas of agreement and disagreement.” I’m pleased, Jay (and not altogether surprised), to see that we share a lot of common ground. Let me start with that, then save what I see as our major dispute (what we can learn from reading and math scores) for another post.

I’m thrilled that you dismissed the extreme position of some libertarians, who argue that society should never override the choices of parents. You write:

I…do not mean to suggest that policy makers should never close a school or shutter a program in the face of parental demand. I’m just arguing that it should take a lot more than “bad” test scores to do that.

I agree entirely, and on both counts. First let me explain why “we” should, on rare occasions, close a school that some parents prefer. And second, let me discuss what else beyond “bad” test scores we might consider when doing so.

You and others have heard me argue ad nauseam that because education is both a public and a private good, it’s only fair that both parties to the deal have a say in whether that good is, well, good enough. We both abhor a system whereby a district monopoly assigns children to schools and parents must accept whatever is handed to them. But the flip side is that we should also reject chronically low-performing schools—those that don’t prepare their young charges for success academically or otherwise—and deem them undeserving of taxpayer support. They aren’t fulfilling their public duty to help create a well-educated and self-sufficient citizenry, which is what taxpayers are giving them money to do.

Furthermore, there are real financial and political costs to letting bad schools—including schools of choice—fester. We see this in many cities of the industrial Midwest—Detroit, Cleveland, and Dayton come to mind—where too many schools are chasing too few students. Perhaps the marketplace forces of “creative destruction” will eventually take hold and the weakest schools will disappear, allowing the remaining ones sufficient enrollment to ensure their financial sustainability and a higher level of program quality. But that process is taking an awfully long time, particularly when we’re talking about disadvantaged children who have no time to waste. The charter sectors in these cities would be stronger—academically, financially, and politically—if authorizers stepped in to close the worst schools. But some libertarians see that as paternalistic government intrusion. I think they are misguided; I hope that you agree.

Now to your second point, that “it should take a lot more than ‘bad’ test scores” to “close a school or shutter a program in the face of parental demand.” Hear, hear! This is the genius of effective charter school authorizers that look at a school’s big picture as well as its scores. Fordham’s Dayton office strives hard (and with fair success) to be that kind of authorizer. We certainly look at test scores—especially individual student progress over time, a.k.a. “value added.” But we also examine lots of other indicators of school quality, operational efficiency, and financial sustainability. (See our current accountability framework in the appendix here.) And most importantly, we know our schools intimately. We attend their board meetings, conduct site visits frequently, and get up close and personal.

So when we consider the painful step of closing a school (which we’ve had to do a handful of times), we’re hardly just sitting in our offices “looking at spreadsheets of test scores.” The same goes for other leading authorizers nationwide.

Not that it’s easy to identify measures beyond reading and math scores that are valid and reliable indicators of school success. I share your enthusiasm for character education, non-cognitive skills, high school graduation rates, and long-term outcomes such as college completion and labor market earnings. And I’d love to see states maintain regular testing in history, geography, science, and more. Whenever we can use those scores, we absolutely do. But as the early debate around the Every Student Succeeds Act illustrates, measures of character and non-cognitive skills don’t appear ready for prime time, and they may never be appropriate for high-stakes decisions. High school graduation rates, meanwhile, are perhaps the phoniest numbers in education. Long-term outcomes are just that—long-term, making them both difficult to tie to a school (especially and elementary or middle school) and not very helpful for making decisions in the here and now. And there’s no political appetite for more testing; if anything, everyone wants less. (Let me know if you disagree with my analysis.)

So where does that leave us? As far as I can tell, facing a trade-off, which is the normal state of affairs in public policy. We can either use reading and math gains as imperfect indicators of effectiveness while working to build better measures—buttressed by school visits and the like—or we can succumb to “analysis paralysis” and do nothing.

I know which one I prefer. What about you?

(Also posted at Flypaper)


Psssst, WaPo, Your Bias Is Showing!

May 2, 2016

(Guest Post by Jason Bedrick)

Congress voted on Friday to reauthorize the D.C. Opportunity Scholarship Program (OSP) and the Washington Post‘s headline could barely contain its exasperation:

GOP House passes D.C. private schools voucher program. Again.

Cute, right? But it gets better. (And by “better” I mean “worse.”)

Here’s how the WaPo reporter characterized support for the program:

Local D.C. leaders have long been against the voucher program, arguing that it diverts money and students away from the public school system. But federal funding for the local schools system is tied to the legislation, and Mayor Muriel E. Bowser (D) and some council members have expressed support for the bill.

So unnamed “local D.C. leaders” oppose the voucher program, but the Democratic mayor and “some” council members support it. How many council members?

Bowser and eight council members wrote in a March letter to congressional leaders that a reauthorization of the act is “critical to the gains that the District’s public education system has seen.”

Eight members supported the voucher program… Well how many members are on the D.C. city council? Thirteen, you say? So more than 60 percent of the council supports the voucher program and WaPo calls that “some.”

Throw in support from the current mayor and previous Democratic D.C. Mayors Anthony Williams, Adrian Fenty, and even Marion Barry (!), and WaPo‘s characterization that “local D.C. leaders have long been against the voucher program” looks even more ridiculous. Given that the majority of the city council and the majority of recent mayors support the OSP–to say nothing of the longstanding support from the WaPo editorial board–it would be equally if not more true to say that “local D.C. leaders have long supported the voucher program.” At the very least, WaPo could have actually named a few of the voucher opponents who are “local leaders” (the article cites only D.C. Del. Eleanor Holmes Norton) and written “local D.C. leaders have long been divided over the voucher program.”

WaPo, you can do better than that.

tumblr_m1ekg3qclg1qlkx6do5_250


The weak predictive power of test scores

May 2, 2016

Here’s my first round in the debate with Mike Petrilli over whether test scores are reliable indicators of quality that can be used by regulators and policymakers to identify schools to be closed or expanded…

——————————————————-

The school choice tent is much bigger than it used to be. Politicians and policy wonks across the ideological spectrum have embraced the principle that parents should get to choose their children’s schools and local districts should not have a monopoly on school supply.

But within this big tent there are big arguments about the best way to promote school quality. Some want all schools to take the same tough tests and all low-performing schools (those that fail to show individual student growth over time) to be shut down (or, in a voucher system, to be kicked out of the program). Others want to let the market work to promote quality and resist policies that amount to second-guessing parents.

In the following debate, Jay Greene of the University of Arkansas’s Department of Education Reform and Mike Petrilli of the Thomas B. Fordham Institute explore areas of agreement and disagreement around this issue of school choice and school quality. In particular, they address the question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with test results?

To a very large degree, education reform initiatives hinge on the belief that short term changes in reading and math achievement test results are strong predictors of long term success for students. We use reading and math test scores to judge the quality of teachers, schools, and the full array of pedagogical, curricular, and policy interventions. Math and reading test scores are the yardstick by which education reform is measured. But how good of a yardstick is it?

Despite the centrality of test scores, there is surprisingly little rigorous research linking them to the long-term outcomes we actually care about. The study by researchers from Harvard and Columbia (Chetty, et al.) showing that teachers who increase test scores improve the later-life earnings of their students is a notable exception to the dearth of evidence on this key assumption of most reform initiatives. But that is one study, it has received some methodological criticism (although I think that has been addressed to most people’s satisfaction), and its results from low-stakes testing may not apply to the high-stakes purposes for which we would now like to use them. This seems like a very thin reed on which to rest the entire education reform movement.

In addition, we have a growing body of rigorous research showing a disconnect between improving test scores and improving later-life outcomes. I’ve written about this at greater length elsewhere (see here and here), but we have eight rigorous studies of school choice programs in which the long-term outcomes of those policies do not align with their short-term achievement test results. In four studies, charter school programs that produce impressive test score gains appear to yield no or little improvement in educational attainment. In three studies of private school choice and one charter school choice program, we observe large benefits in educational attainment and even earnings but little or no gains in short-term test score measures.

If policy analysts and the portfolio managers, regulators, and other policy makers they advise were to rely primarily on test scores when deciding which programs or schools to shutter and which to expand, they would make some horrible mistakes. Even if we ignore the fact that most portfolio managers, regulators, and other policy makers rely on the level of test scores (rather than gains) to gauge quality, math and reading achievement results are not particularly reliable indicators of whether teachers, schools, and programs are improving later-life outcomes for students.

What explains this disconnect between math and reading test score gains and later-life outcomes? First, achievement tests are only designed to capture a portion of what our education system hopes to accomplish. In particular, they are not designed to measure character or non-cognitive skills. A growing body of research is demonstrating that character skills like conscientiousness, perseverance, and grit are important predictors of later-life success (see this, for example). And more recent research by Matt KraftKirabo Jackson, and Albert Cheng and Gema Zamarro (among others) shows that teachers, schools, and programs that increase character skills are not necessarily the same as those that increase achievement test results. There are important dimensions of teacher, school, and program quality that are not captured by achievement test results. Second, math and reading achievement tests are not designed to capture what we expect students to learn in other subjects, such as science, history, and art. Prioritizing math and reading at the expense of other subjects that may be important for students’ later-life success would undermine the predictive power of those math and reading results. Third, many schools are developing strategies for goosing math and reading test scores in ways that may not contribute to (and may even undermine) later-life success. The fact that math and reading achievement results are overly narrow and easily distorted makes them particularly poor indicators of quality and weak predictors of later-life outcomes.

I do not mean to suggest that math and reading test results provide us with no information or that we should do away with them. I’m simply arguing that these tests are much less reliable indicators of quality than most policy analysts, regulators, and policy makers imagine. We should be considerably more humble about claiming to know which teachers, schools, and programs are good or bad based on an examination of their test scores. If parents think that certain teachers, schools, and programs are good because there is a waiting list demanding them, we should be very cautious about declaring that they are mistaken based on an examination of test scores. Even poorly educated parents may have much more information about quality than analysts and regulators sitting in their offices looking at spreadsheets of test scores.

I also do not mean to suggest that policy makers should never close a school or shutter a program in the face of parental demand. I’m just arguing that it should take a lot more than “bad” test scores to do that. Yes, parents can and will make mistakes. But analysts, authorizers, regulators, and other policy makers also make mistakes, especially if they rely predominantly on test results that are, at best, weak predictors of later-life success. The bar should be high before we are convinced that the parents are mistaken rather than the regulators poorly guided by test scores. Besides, we should prefer letting parents make mistakes for their own children over distant bureaucrats making mistakes for hundreds or thousands of children while claiming to protect them.

(Also posted at Flypaper )