Beware of Mis-NAEPery but also NAEPilism

April 3, 2018

(Guest Post by Matthew Ladner)

The 2017 NAEP will be released next week, and a few notes seem in order. Over time, the term “mis-NAEPery” has slowly morphed into a catchall phrase to mean “I don’t like your conclusions.” Mis-NAEPery however has an actual meaning- or at least it should- which ought to be something along the lines of “confidently attributing NAEP trends to a particular policy.”

Arne Duncan for instance took to the pages of the Washington Post recently in order to lay claim to all positive NAEP trends since 1990 to his own tribe of reformer (center left):

Lately, a lot of people in Washington are saying that education reform hasn’t worked very well. Don’t believe it.

Since 1971, fourth-grade reading and math scores are up 13 points and 25 points, respectively. Eighth-grade reading and math scores are up eight points and 19 points, respectively. Every 10 points equates to about a year of learning, and much of the gains have been driven by students of color.

Duncan then proceeds to dismiss the possibility that student demographics had anything to do with this improvement, as the American student body has grown “It should be noted that the student population is relatively poorer and considerably more diverse than in 1971.” This is a contention however deserving dispute, given that the inflation adjusted (in constant 2011 dollars) income of the poorest fifth of Americans almost doubled between 1964 and 2011 once various transfers (food stamps, EITC etc.) have been taken into account. Any number of other things could also explain the positive trend, both policy and non-policy related, but never mind any of that, Mr. Duncan lays claim to all that is positive.

Duncan was not finished yet, however, as he was at pains to triangulate himself away from those nasty people who support more choice than just charter schools:

Some have taken the original idea of school choice — as laboratories of innovation that would help all schools improve — and used it to defund education, weaken unions and allow public dollars to fund private schools without accountability.

Well that sounds a bit like how a committed leftist would (unfairly) describe my pleasant patch of cactus. Arizona NAEP scores, could you please stand to acknowledge the cheers of the audience:

So the big problem in that chart are the blue columns. These charts stretch from the advent of the Obama years until the (until Tuesday) most recently available data. We won’t be getting new science data this year, so ignore the last two blue columns on the right. What we are looking at is changes in scores of 1 point in 4th grade math, -1 point in 8th grade math, 1 point in 4th grade reading and two points in 8th grade reading. There’s only one state that made statistically significant academic gains on all six NAEP tests during the Obama era, but it just so happens to be one of the ones adopting the policies uncharitably characterized by Duncan’s effort at triangulation.

There were some very large initiatives during these years- Common Core standards, teacher evaluation, etc. and we can’t be sure why the national numbers have been so flat, but let’s just say that a net gain of three scale points across four 500 scale point tests fails to make much of an impression. Supporters of the Common Core project for instance performed a bit of a Jedi mind trick around the 2015 NAEP by noting that scores were also meh in states that chose not to adopt, and that 2015 was early yet. Fair enough on the early bit, but the promise of an enormous investment of political capital in the project was not that adopting states would be equally meh, but rather that things would get better.

Where’s the BETTER?!?

Duncan’s misNAEPery however is of the garden variety- there has been far worse. Massachusetts for instance instituted a multi-faceted suite of policy reforms in 1993, and their NAEP scores increased from a bit better than nearby New Hampshire to two bits better than New Hampshire and tops in the country. So far as I can tell, there was approximately zero effort to establish micro-level evidence on any of the multiple reform efforts, or to disentangle to the extent policies were having a positive impact, which policies were doing what. That would be silly- everyone knows that standards and testing propelled MA to the top NAEP scores, and once everyone else does it we will surge towards education Nirvana Canadian PISA scores. Well, I refer the honourable gentleman to tiny blue columns in the chart I referenced some moments ago.

This is not to say that I am confident that testing and standards had nothing to do with MA’s high NAEP scores. I’m inclined to think they probably did, but some actual evidence would be nice before imposing this strategy on everyone. In Campbell and Stanley terms “Great Caesar’s Ghost! Look at those Massachusetts NAEP scores!” lacks evidence of both internal and external validity. In other words, we don’t know what caused MA NAEP scores, nor do we know who if anyone else might be able to pull it off, assuming policy had something to do with it.

So beware of mis-NAEPery my son- the jaws that bite, the claws that catch!  Also beware of NAEP nihilism. Taking off my social science cap, I will note that NAEP is an enormous and highly respected project and it is done expressly for the purpose of making comparisons. Yes we should exercise a high level of caution in so doing, and should check any preliminary conclusions reached against other sources of available evidence. The world is a complicated place with an almost infinite number of factors pushing achievement up or down at any point. There is a great deal of noise, and finding the signal is difficult. NAEP alone cannot establish a signal.

The fact that the premature conclusions drawn from the Massachusetts experience lacked evidence of internal and external validity did not mean that those conclusions were wrong but it did make them dangerous. Alas the world does not operate in a random assignment study. Policymakers must make decisions based upon the evidence at hand, NAEP and (hopefully) better than NAEP. The figure at the top of this post makes use of NAEP and there is a whole lot of top map green (early goodness) turning into bottom map purple (later badness) going on. This is a bad look assuming part of what you want out of your support of K-12 education is kids learning about math and reading in elementary and middle school. Let’s be careful, but let’s also see what happens next.



Duncan and the Abuse of Research (As Well As Power)

February 24, 2012

Education Secretary Arne Duncan’s press statement on South Carolina was a bizarre display of the opposite of what it intended.  As Greg pointed out, the statement’s harsh and threatening tone did nothing to support the claim that  Common Core national standards and assessments are a purely voluntary consortium of the states.  Instead, the statement was a not so veiled threat that South Carolina would lose out on the opportunity for federal grants like Race to the Top and lose the opportunity to receive waivers from impossible to satisfy NCLB requirements if it followed through with a proposal to withdraw from Common Core.  If it is purely voluntary, why the need for threats and intimidation from the Education Secretary?

In addition to this abuse of power given the legal prohibitions on the US Department of Education from establishing national standards, testing, and curriculum, Duncan’s statement also displayed an abuse of research.  He distorted the findings of a National Center for Education Statistics (NCES) analysis to suggest that South Carolina had particularly weak performance standards when the research had not shown that.  Duncan claimed:

[Prominent Republicans] have supported the Common Core standards because they realize states must stop dummying down academic standards and lying about the performance of children and schools. In fact, South Carolina lowered the bar for proficiency in English and mathematics faster than any state in the country from 2005 to 2009, according to research by the National Center for Education Statistics.

South Carolina did significantly lower its performance standards between 2005 and 2009. But they did so because they had earlier raised those performance standards to well-above the national average.  In the end, South Carolina had math and reading performance standards that were close to the national average and close to the NAEP standard for Basic.

One of the potential benefit of state control over performance standards is that they can raise or lower them so that they are not too easy so that everyone passes or so hard that everyone fails. You have to hit the sweet spot between these points to motivate students and educators to improve without crushing them. Each state may have a different sweet spot and needs the flexibility to adjust in case they miss the mark (as SC initially did) or in case achievement improves (as has occurred in FL).

We actually had Jack Buckley, the Commissioner of NCES, out to give a lecture in Arkansas during which he presented this analysis. You can see a summary and the slides here.

Compared to what we could have had as an education secretary, Duncan has been pretty good.  He’s shown some independence from the teachers unions and supported some promising reforms, like charter schools.  But he’s ignored his own department’s research in seeking (multiple times) to kill the DC voucher program.  And he seems oblivious to the limits of power that he and the federal government have over education policy.  When people abuse their power they may also be more likely to abuse research.

This Deal Is Getting Worse All the Time

February 23, 2012

(Guest post by Greg Forster)

Shorter Arne Duncan: The U.S. Department of Education is not pressuring states to adopt Common Core. However, any state that takes action to resist Common Core will be immediately singled out by the Education Secretary for an extremely harsh public denunciation of its education system – which will obviously make it effectively impossible for the Department to look favorably upon that state when doling out grants and waivers for the foreseeable future.

The Solyndra of Digital Learning

September 19, 2011

Education Secretary, Arne Duncan, and Netflix CEO, Reed Hasting, have an op-ed in today’s Wall Street Journal that starts out great but then goes dramatically downhill.  They begin by recognizing the amazing potential of digital learning:

In the past two decades, technology has revolutionized the way Americans communicate, get news, socialize and conduct business. But technology has yet to transform our classrooms. At its full potential, technology could personalize and accelerate instruction for students of all educational levels. And it could provide equitable access to a world-class education for millions of students stuck attending substandard schools in cities, remote rural regions, and tribal reservations.

But then they advocate for a federal government-backed corporation to realize digital learning’s potential:

Too often, the market for educational technology has been inefficient and fragmented. The nation’s 14,000 school districts, more than a few of which have byzantine procurement systems, have been inefficient consumers and have failed to drive consistent demand. And a robust R&D base for improving and refining educational technology has been sadly lacking.

To help remedy those gaps, the Department of Education is launching a unique public-private partnership called Digital Promise.

The last thing digital learning needs is a government funded outfit to develop it.  The government is particularly bad at picking technological winners and losers.  And if the government pours money into Digital Promise and signals to states and districts that they should adopt what Digital Promise endorses, they will stifle a developing vibrant marketplace that will experiment with different technologies and approaches to learn what work best.

If you don’t believe me that the government is particularly incapable of picking winners and losers in technology, just look at the example of Solyndra.  The government poured more than half a billion dollars of stimulus money into Solyndra’s technology for solar energy, believing that it would be the wave of the future.  As it turns out, they backed a more expensive technology that failed to win in the marketplace.  Solyndra recently declared bankruptcy, laying off more than 1,000 workers and blowing more than half a billion dollars of taxpayer money.

In addition to blowing taxpayer money by backing the wrong technology, Digital Promise is the digital learning equivalent of mandating Betamax.  If we privilege the wrong technology we will crowd out better solutions and productive innovation.

Giving taxpayer money to certain outfits also runs the risk of corruption, since political connections may well influence which company and technologies get backed.  This leads to Crony Capitalism, or crapitalism.

For the sake of digital learning, Mr. Secretary, please stop “helping” it with a government backed organization, like Digital Promise.

(Correction: Digital Promise is a Non-Profit Organization, but all the points still apply)

Sen. Rubio Letter to Sec. Duncan on National Standards

September 14, 2011

Arne Duncan, Suuuuuuuuuper Geeeeeeenius!

August 12, 2011

(Guest post by Greg Forster)

Before he goes ahead with the plan to set himself up as America’s first one-man legislature, Arne Duncan might want to read this detailed, devastating takedown by Rick Hess.

This is pretty much what I was trying to get at in the comments earlier this week, except a whole lot better both on substance and humor value. I couldn’t stop laughing, and I also couldn’t stop crying.

(Although I do think I should get points for working in an Iron Chefs reference.)

If Duncan doesn’t pick up the clue Rick is putting out on the table for him, here’s how his tenure might be remembered:


Arne Duncan on Atlanta Cheating Scandal

July 21, 2011

(Guest Post by Matthew Ladner)

Political scientist Donald Campbell postulated that “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” The Army of Angry Teachers has seized upon the Atlanta cheating scandal as proof that the whole process of testing and transparency is destructive and ought to be done away with.

Arne Duncan weighs in on the Atlanta cheating scandal as a part of a roundtable at the WaPo. Duncan provides a bit of much-needed perspective on the problems of testing, noting that although the Atlanta scandal is the worst uncovered, that it involves 44 schools out of thousands in Georgia.

Secretary Duncan goes on to acknowledge a number of problems in state academic testing, including the far larger problem of states dummying down their cut scores in order to proclaim improvement.

One elephant in the room: test security. It isn’t difficult to infer that while the state of Georgia performed erasure analysis on the tests (thus uncovering the cheating) that they failed to let it be known that they would be doing so on a large-scale (and thus failed to deter the cheaters, who thought they could get away with it). States need to not only employ these techniques, they need to employ them as deterrents.

People are quite clever, however, and constantly develop new ways to cheat if provided incentives to do so. It seems possible that a system of third-party administration of tests will need to be developed as we attach greater consequences to test scores, including school ratings and merit bonuses. This could be a simple as the way you took the SAT test, or it could have a more high-tech look to it.

Another Campbell’s Law problem that strikes me as more serious than systematic answer changing by staff is the practice of teaching to test items. I fear that this is quite widespread, although it is difficult to quantify. The idea behind the standards movement is to teach to a set of academic standards, and to use testing to measure success. If teachers instead teach to a set of test items then the whole process can devolve into a farce.

A skillfully managed system of student testing can and has played a leading role in improving student outcomes. It’s difficult to pull off, and easy to foul up. We should be concerned about staff led cheating. We should be even more concerned about low cut scores, item exposure and test study guides.