If You Mostly Care About Test Scores, Private School Choice Is Not For You

April 28, 2017

If you mostly care about test scores, private school choice is not for you.  Despite the vast majority of randomized control trials (RCTs) of private school choice showing significant, positive test score effects for at least some subgroups of students, some of those gains have been modest and other effects have been null for at least some subgroups.  And now we have two RCTs, in Louisiana and DC, showing  significant test score declines for at least some subgroups and in some subjects.  The Louisiana decline is large and across-the-board, but the significant, negative effect in the new DC study appears to be  “driven entirely by students in elementary grades not previously in a needs-improvement school.

People will quibble over why these new DC results showed at least a partial decline.  They will note that the prior RCT of DC vouchers showed significant test score gains after three years (although the p value rose to .06 in year four even as the positive estimate remained).  They will note that vouchers in DC are worth almost 1/3 as much as the per pupil funding received by DC’s traditional public schools and almost half as much as DC’s charter schools.  Imagine how they might do if they received comparable resources (and yes resources can matter if there are proper incentives to use resources productively).  They will note that almost half of the control group attended charter schools, so to a large degree this study is a comparison of how students do in vouchers relative to charters.

But these largely miss the point — the benefits of private school choice are clearly evident in long term outcomes, not near-term test scores.  In the same DC program that just produced disappointing test score effects, using a voucher raised high school graduation rates by 21 percentage points.  Similarly, private school choice programs in Milwaukee and New York City were less impressive in their test score effects than in later educational attainment, where private school students in both cities were significantly more likely to enroll in college.

But if what you really care about is raising test scores, you’d be pushing no-excuses charter schools.  Rigorous evaluations, like the one in Boston, show huge test score gains for students randomly assigned to no-excuses charter schools.  You don’t even have to have school choice to produce these gains.  The same team of researchers showed that schools converted into no-excuses charters as part of a turnaround effort produced similarly big gains for students who were already there and did nothing to choose it.  The lesson that a fair number of foundations and policymakers draw is that we don’t need this messy and controversial choice stuff.  They believe that they have discovered the correct school model — it’s a no excuses charter — and all we need to do is get as many disadvantaged kids into these kinds of schools as we can, with or without them choosing it.

Unfortunately, no excuses charters don’t seem to produce long-term benefits that are commensurate with their huge test score gains.  The Boston no excuses charter study, for example, shows no increase in high school graduation rates and no increase in post-secondary enrollment despite large increases in test scores.  It’s true that students from those schools who did enroll in post-secondary schooling were more likely to go to a 4 than 2 year college, but it is unclear if this is a desirable outcome given that it may be a mismatch for their needs and this more nuanced effect is not commensurate with the giant test score gains.

This same disconnect between test scores and later life outcomes exists in several rigorously conducted studies of charter schools, including those of  the Harlem Promise Academy, KIPP, High Tech High, SEED boarding charter schools, and no excuses charters in Texas.  While of course we would generally like to see both test score gains and improved later life outcomes, the thing we really care about is the later life outcomes.  And the near-term test scores appear not to be very good proxies for later life outcomes.

So, what should we think about these new test results from DC vouchers, showing some declines for students after one year in the program?  We already know from rigorous research that the program improves later life outcomes, so I don’t think we should be particularly troubled by these test results.  It may be that control group students are in schools that will fare as well or better on test score measures.  But we should remember that 42% of that control group are in the types of charter schools that other research has shown can produce giant test score gains without yielding much in later life outcomes.  And we know that treatment group students are in a program that has previously demonstrated large advantages in later life outcomes.

I understand that many reporters, foundations, and policymakers act like they mostly care about test scores and these new results from DC have them all aflutter.  But if people could only step back for a second and consider what we are really trying to accomplish in education, the evidence is clearly supportive of private school choice in DC and elsewhere.

(edited to correct error noted in comments)


AEI releases Education Savings Accounts the New Frontier in School Choice

April 26, 2017

(Guest Post by Matthew Ladner)

Get your copy-all the really cool kids are reading it! Great collaborative project to address the promise, practicalities and pitfalls of an account based system of parental choice. Here is a chapter summary written by some of your favorite edu-nerdsthinkers! RUN don’t walk to order your copy!

“Introduction”
Adam Peshek and Gerard Robinson

“You Say You Want an Evolution? The History, Promise, and Challenges of Education Savings Accounts”
Matthew Ladner

“The Constitutional Case for ESAs”
Tim Keller

“Education Savings Accounts: The Great Unbundling of K–12 Education”
Adam Peshek

“Public and Policymaker Perceptions of Education Savings Accounts: The Road to Real Reform?”
Robert C. Enlow and Michael Chartier

“The ESA Administrator’s Dilemma: Tackling Quality Control”
John Bailey

“State Education Agencies, Regulatory Models, and ESAs”
Gerard Robinson

“Parents and Providers Speak Up”
Allysia Finley

“Hubs and Spokes: The Supply Side Response to Deregulated Education Funding”
Michael Q. McShane

“Settling on Education Savings Accounts”
Nat Malkus

“Conclusion”
Nat Malkus

 


Fordham Sees Signs of Charter Failure Even When They Aren’t There

April 25, 2017

Image result for signs tin foil hat gif

Fordham has a new report out that claims to have discovered three warning signs in charter applications that make those charters more likely to have low-performance in their initial years.  If this were true, it would be a major development given that prior research has failed to find characteristics of charter applications that predict later academic outcomes.  Unfortunately, a straightforward interpretation of the results in Fordham’s new report suggests that there are no reliable predictors of charter failure.  Despite organizations like the National Association of Charter School Authorizers (NACSA) receiving millions of dollars from foundations and even receiving contracts from states to evaluate applications, there are no scientifically validated criteria for predicting charter failure from their applications.

The Fordham analysis obtained charter applications from four states.  They then focused on the successful applications and identified 12 factors that could be coded consistently in charter applications that they thought might be related to future charter performance.  The authors conducted a series of 12 logit regressions to see if any of these 12 factors were significantly related to charters being low-performing later on.  Only one of those 12 factors was significantly related.  Charter applications that said they intended to serve at-risk students were significantly more likely to be low performing on standardized tests.  (See Table C-1)  Other than that, no other factor was significantly related to charter performance.

So, Fordham might have concluded that if you want to avoid authorizing low-performing charter schools, stay away from charters that serve disadvantaged kids.  Of course, this would be a little like advising people who want to be millionaires to first start with a million dollars.  All that the finding reveals is that their measure of charter outcomes is a lousy measure that fails to capture how charters might really help disadvantaged students.

But don’t worry, Fordham never highlights the straightforward results presented in Table C-1.  Instead, the authors engage in a convoluted exercise in data mining to see if they can’t turn up some more palatable and marketable results.  So, they engage in a mechanical process of adding and removing these 12 factors and interactions of those 12 in a single model until they arrive at a “best fit.”  This is exactly the type of atheoretical mining of data that we warn our students not to do.  You should have variables in your model because you think they are theoretically related to the dependent variable, not because you tried every combination and this is the one that gave you the best fit.

In total there are 78 possible variables that could have been included if you try every one of the 12 variables plus every paired inter-action of those 12.  By chance we would expect about 3 of those 78 variables would be statistically significant and sure enough the Fordham analysis finds three significant factors.  This time they find that charters focused on at-risk students are more likely to fail if that is combined with failing also to propose intensive small-group instruction or tutoring.  They also find that charters that fail to name leaders are more likely to fail, but only if that factor is combined with the charter not being part of a CMO.

Fordham can invent post-hoc rationalizations of why these particular combinations of factors make sense, but the main point is that this is all post-hoc and could well just be the result of chance.  And it is important to emphasize that proposing intensive small-group instruction, being part of a CMO, or even naming a school leader on their own are not predictors of later test scores.  It’s only when inter-acted and in this particular combination of variables that they arrive at statistical significance.

Even worse, the “best fit” model finds that a factor that was not a significant predictor in the straightforward analysis becomes significant when included with these other factors.  So, Fordham concludes that schools proposing a child-centered pedagogy are more likely to have low performance even though the straightforward analysis did not find this.  When results vary depending on atheoretical changes in model composition we call those findings unstable or not robust.  But Fordham is undeterred and draws the conclusion that child-centered schools may be bad despite this instability in the result.

The truth is that this report provides no scientific evidence on factors that predict future low performance by charter schools.  I know that Fordham is determined to find Signs, but in this case they are likely just seeing the chance result of an atheoretical data-mining exercise.


It’s Too Much Winning Arizona!

April 25, 2017

(Guest Post by Matthew Ladner)

The Arizona winning just does not stop-BASIS scores five of the top 10 US News and World Report’s Top 10 high schools.

You other 49 states are cordially invited to join in the winning. We’ve yet to find any point of diminishing marginal returns here in the Cactus Patch.

 

 

 

 

 


#TooMuchWinningAZ

April 24, 2017

(Guest Post by Matthew Ladner)

This weekend, the Arizona Republic editorial board cited a forthcoming report from the Morrison Institute to note that Arizona was the only state to achieve statistically significant increases on all six NAEP exams between 2009 and 2015. I decided to check it out.

The 2009 to 2015 period was not chosen arbitrarily, and has a good deal of historical significance. We can begin to track science achievement under the new framework starting in 2009. It is highly desirable to include science, as it is a “non-tested” subject for state accountability purposes. Starting the clock in 2009 is also useful historically as it tells us which states coped best with the Great Recession.

The chart above is a net of statistically significant gains minus statistically significant declines by test for the 2009 to 2015 period for 4th and 8th grade Math, Reading and Science. Only Arizona hit the maximum of six statistically significant improvements with zero statistically significant declines so got a score of six. South Dakota apparently had the worst overall performance with a minus three. Boring but necessary note: a few states (AK, CO,KS,NE,LA, PA and VT) did not participate in the Science exams, so their scores could only range between -4 and 4.

I think the President may have been referencing AZ NAEP scores when he said:


Review of Letters to a Young Education Reformer

April 23, 2017

Below is an edited version of a review of Rick Hess’ new book that John Thompson, educator and frequent internet commentator, sent to me.  While I’m sure that John and I do not see eye to eye on all things (I think I’m much shorter), I find his perspective valuable and there is much in this review that I find useful.  The original came in at over 5,000 words spread across multiple posts, but with his permission I have edited it down to about 2,000 words in a single post. Enjoy.

(Guest Post by  John Thompson)

I’m not sure that I completely believe him, but Rick Hess concludes his Letters to a Young Education Reformer by saying he’s not a nice guy. He chides the last generation of school reformers not for the not-nice things they’ve done, but for ignoring too many key tenets of professionalism. He also shares some valuable thoughts with “‘far-from-young,’ reformers,” and veteran teachers like me who still have a hard time grasping how and why the accountability-driven, competition-driven social engineering experiment was imposed on our nation’s schools.
Hess describes himself as a “little-r reformer,” as opposed to a “Big-R reformer.” Little-r reformers believe that schools can do a far better job, and that schooling must be reimagined. They are less confident than Big-R Reformers that they know the answers. Hess seeks a “big-tent” approach to education, and a small d-democratic vision for public education. Big-R Reform, however, “has congealed into a set of prescriptions, it has grown more bureaucratic and self-assured, and further and further removed from the intuitions of little-r reform.”
Similarly, Big-P Philanthropy has enabled the hubris of Big-R Reform, and furthered the move towards the micromanaging of diverse schools across the nation. When Big-R Reform, Big-Philanthropy, and an activist federal Department of Education join together in an effort to social engineer public education, dissent can be quashed. I would add that when the clash of ideas is driven out of schools, the way that it often has been during the last 15 years, democracy is undermined.
Hess explains to young reformers why they should learn to control their passion. Thinking that they are uniquely on the side of angels, reformers pushed the soundbite, “This is about kids, not adults!” In doing so, novice reformers remained oblivious to another of Hess’ truisms – “implementation matters.” Real world, Hess explains, “For better or worse, good schools are the product of thousands of tiny judgments that those educators make every day.” So, by definition, if you want to improve kids’ lives, continually disrespecting teachers is not the way to transform “the status quo.”
Hess does a great job in explaining how and why reformers often display little patience for opposing ideas or for obstacles to their grand theories. First, they were in much too much of a hurry to learn from history’s missteps. Reformers, who often had two or three years of classroom experience – or less – quickly developed an extreme case of “groupthink.” Not knowing what they didn’t know about the history of “silver bullets” that have been hurriedly and repeatedly dumped on our schools, “this or that group of reformers” have demonstrated a clear pattern where they “settle on an agenda and then dismiss doubters as troublemakers.”
As much as it pains me to admit this about a conservative, Hess offers the single most telling anecdote illustrating the irrationalities that groupthink can produce. In late 2002, Hess attended a secret briefing at the Pentagon about the Bush administration’s educational mission in Iraq. It was clear that nobody had much of an idea regarding the situation they would be facing. One issue dominated the meeting, however. Iraq needed its own version of No Child Left Behind!
Hess isn’t a fan of high-stakes testing and he is skeptical of the value-added teacher evaluations that were pushed by the Gates Foundation and the Duncan administration. He credits reformers for ending the “old stupid,” or ignoring data systems, while concluding “the slapdash embrace of half-baked data is ‘the new stupid.’” Hess estimates that test scores “reflect 30 to 35% of what we want schools to do.” The use of those metrics was supposed to move us into the “moneyball,” or the data-informed baseball coaching that was popularized by Michael Lewis. Real world, data moved schools into the pre-moneyball era. But, reformers chose to act nice by talking about teachers as if they are girl scouts. They then used value-added models in ways that teachers were bound to see as a “hatchet job.”
One of the best things about Hess, the Little-r reformer, is that he advises reformers to learn from history’s missteps. He understands that “implementation matters,” but that reformers can have little patience for opposition or obstacles to the experiments that they mandate. As Hess has watched “this or that group of reformers settle on an agenda and then dismiss doubters as troublemakers,” he has been dismayed by the “groupthink” that has grown out of their frustrations.
My favorite Hess statement is that Big-R Reformers “learned the lyrics, not the music.” I’ve repeatedly heard reformers, who had little or no experience in the classroom, complain that the attaching of stakes to test scores did not need to produce teach-to-the-test, basic skills instruction. They demand that “everyone sing from the same hymnal” but deny that any words in the lyrics require drill and kill. Being clueless about the people side of schooling, Big-RReformers never understood that it was not what they said that matters. What matters is what school systems would hear.
Of course, test-driven accountability, as well as using test scores as the ammunition in the fight between charters and neighborhood schools, forced administrators and teachers to engage in bubble-in malpractice. The big harm came from the rapid scaling up of high-stakes testing directed at individual teachers and students, and charters.  Even in the early days of No Child Left Behind, educators had plenty of options for pretending to comply with mandates while, predictably, shutting their classroom doors and continuing to teach in the same old, good and bad ways.  Reformers responded by doubling down on both the punitive in terms of both the survival of schools and the evaluations of individuals, and by a “growing fascination with PR campaigns and political strategies.”
As with NCLB, the Obama administration imposed quantifiable targets that obviously were impossible to meet. I don’t know when Hess attended the meeting described in Letters to a Young Education Reformer, but he recalls “a no-nonsense veteran” state administrator in Florida who said he could manage about seven turnarounds. The audible shock that he prompted would have been funny if it hadn’t illustrated the reality-free nature of the campaign for mass transformations of the lowest-performing 5% of schools.
Hess is especially perceptive in diagnosing the predictable failure of Race to the Top (RttT) and School Improvement Grants (SIG). I don’t know how many 500-page RttT applications on a nineteen-item checklist Hess read but he reached the same conclusion that I did after studying many of them. There was no need to read the lyrics when the RttT hit an unmistakable chord. The applications’ words didn’t explicitly forbid the investment of time and money into the aligned and coordinated student supports that would have provided the foundation necessary for increases in meaningful learning. The timeline and the accountability metrics made it inevitable that hurried, in-one-ear-out-the-other, teach-to-the-test would take off.
In his dealings with national reformers, Hess saw what I witnessed on a local level. Reform leaders enthusiastically embraced the RttT even though “many of the folks in charge had – until about five minutes earlier – been eloquent in explaining how bureaucracy had stymied school reform.”  They had sincerely prided themselves on their opposition to red tape and their entrepreneurialism, but they turned on a dime because, “When your buddies go off to war, you go with them.”
Hess then nails the dynamics which, I believe, made the damage done by corporate reform increase during the Obama years, “When foundations and the federal government link arms, disagreeing with the president’s policies is tantamount to attacking the foundation’s agenda – and vice versa.” He then calls for “little-p rather than big-P philanthropy,” more rethinking, and less defending of the agenda of the moment.
Hess also witnessed the rise of the public relations campaigns that grew out of the effort to immediately impose transformative change. It looks to me that Big-R Reform peaked in the early Obama years when teacher-bashing propaganda like Waiting for Superman was dominant. Hess adds telling details about the way Big-R Reformers sought An Inconvenient Truth for school reform. He clearly remembers one PR pro who said the reform message needed to be “simpler, stupider, and snazzier.” At the time, it was argued that reformers were “too thoughtful for their own good.”
 …
 For years, I tried to explain to Democrats who pushed the Big-R Reform agenda that in education it’s the lyrics, not the music that matters but perhaps Hess is better at getting that point across. I would argue that a huge reason for miscommunication is that Big-R Reformers were disgusted by the timidity, the “culture of compliance,” of school systems and they tried to intimidate the education sector into courageousness.
Not understanding the education sector’s culture of powerlessness, as well as a history of “silver bullets” being continually imposed on schools, Big-R Reformers couldn’t understand why systems remained so cautious. This prompted impatient reformers to become even more strident that punishments must always accompany rewards. They seemed to see disincentives as a normative and essential component of policies, and they seemed frustrated that educators focused on the punitive, not the incentives that corporate reformers also helped fund.
The best example of systems focusing on the music and not the lyrics is the predictable manner in which systems responded to value-added teacher evaluations.  When educators encountered the test score growth models, that were inherently biased against teachers in the highest challenge schools, administrators weren’t likely to listen to the words of reformers who presented new teacher evaluations as a means of recruiting and retaining talent in the inner city.
Reformers would explain that the use of “multiple measures” would make value-added scores less inaccurate than other measures. Reformers were reluctant to put estimates of the inaccuracy rate on paper, but I often heard the guess-timate of 5 to 10%. Somehow, Big-R Reformers failed to comprehend that that would mean that inner city teachers, especially, would face that much of a chance per year of having their careers damaged or destroyed by statistical errors. Reformers seemed incapable of putting themselves in the shoes of educators and understanding why systems would profess support for the measures but then “monkey wrench” them so that only 2% or so of teachers would be dismissed.
Rather than refight the big battles where smart people read the same evidence in different ways, I’ll close with a NAEP test score chart cited by reformer Kevin Huffman in support of the contemporary reform movement. I’d argue that Huffman’s evidence makes a powerful case against his approach to school improvement. (Huffman was debating conservative Jay Greene. Once again, this liberal respects the analysis of a conservative reformer more than the neo-liberal or liberal reformers in the debate.)
Huffman’s graphic shows that 4th grade reading rose nearly 20 points from 1996 to 2015. My first reaction is that reform has shown some success in improving math instruction, especially in the early years. That should not be a surprise given the sorry state of math instruction, especially in elementary schools, that I’d always seen as the norm. (Similarly, it should not be a surprise when input-driven reforms, like increasing high-quality tutoring or adding counselors or mentors, raise student performance but that is not evidence in favor of output-driven reform.)
However, reform has largely failed to raise reading scores, especially in the older years. There also is a simpler example of how reformers twist themselves into pretzels in order to view this evidence as supportive of reform.
The first years when NCLB could have started to improve schools would have been around 2002 or 2003. Fourth grade test scores increased more in the seven years before 2003 than they did in the twelve years that followed the law’s accountability system.  In other words, even the subject which produced the law’s greatest success does not provide support for the effectiveness of school reform.
I would argue that the metric which is most important is 8th grade reading, which is the most valuable skill and the most reliable NAEP test given to the older students. (It’s hard to evaluate the reliability of 12th grade tests.) Those reading scores increased about as much in the four years that preceded NCLB as they did in the thirteen years which followed 2002. And the same pattern applies to all of the data that Huffman presented. If anything, NAEP test score growth slowed after NCLB, and often it stopped after the Obama administration put NCLB accountability on steroids.
I would not argue that NAEP scores, alone, prove that reform failed. But clearly NAEP scores don’t provide evidence that output-driven, market-driven reform increased student performance.
Some reformers reply with the idea that an accountability “meteor” hit schools in the late 1990s, so gains that preceded NCLB should be counted as evidence for its effectiveness. I don’t know how, but some smart reformers may see this argument as something other than intellectual dishonesty and/or Alt Truth. But that opens even more cans of worms in terms of why smart people see the same education evidence in very different ways.
And that brings us back to why we need Letters to Young Education Reformers. The current and new generation of reformers may not find this to be comprehensible, but they need to know that there was a time when teachers were allowed to teach according to their professional judgments and when the economy boomed, student performance increased markedly – more than anything accomplished by the test-driven, competition-approach to increasing student performance.
Gosh, I remember a day when teachers who taught in a meaningful and culturally relevant manner, and treated students as whole human beings, did not have to fear for their jobs for having the temerity to do so. If I go too far down that road, however, I’ll betray myself as even older than Rick Hess.
Finally, somebody needs to write: Letters to a Young Education Reformer, Obama Loyalist to Obama Loyalist.
(edited to correct error in book title)

A Society that Puts Freedom before Equality will get a high degree of both

April 20, 2017

(Guest Post by Matthew Ladner)

Want proof- here is how “9/33” charter sectors did on the 2015 NAEP 8th grade math test. First let’s look at middle and high income kids in Arizona and Colorado charters compared to the statewide averages for middle and high income kids:

Whew- would you look at that? I wonder if those AZ and CO charter school kids are getting half the funding per pupil of Massachusetts or not. Yes, right, so back on track here, those above kids are all middle and high-income, so how did low-income students far in these awful, horrible, no-good Wild West anarchist charter schools perform? I’m glad you asked: