Cool Honesty Gap Graphics on Truth in Advertising in State Testing

(Guest Post by Matthew Ladner)

So yesterday the Godfather of the School Choice movement presented evidence that nuked an oft-repeated claim regarding the great dummy down of American testing. Today the folks at Honesty Gap mop up the poor irradiated wretches to put them out of their misery with some cool graphics showing the increased alignment between state tests and NAEP like:



Now for people like me who support Arizona developing a set of standards unique to Arizona and replacing the current CC standards and possibly test so long as they are as good or better, the presence of Oklahoma on that last chart is a problem. For anti-CC crusaders, it’s an even a larger problem, unless of course you are just shameless is your support for tests that a chimp could sometimes pass blindfolded. Yes Tex I am looking at you. A constructive vote of no-confidence remains a respectable path to leaving CC in the rear view mirror, so I hope Oklahoma will pull it off in plenty of time for the 2017 NAEP.

37 Responses to Cool Honesty Gap Graphics on Truth in Advertising in State Testing

  1. Greg Forster says:

    Constructive vote of no confidence is not how the US system of government works:

  2. matthewladner says:

    Yeah but sorry Rummy we need a post-war plan if you want to invade Iraq.

    • Greg Forster says:

      Like I said in our first exchange about this:

      1) Rumsfeld had a plan. The problem was that so did everyone else in the administration, and all the plans were different, and Bush didn’t impose order. Buck stops at the top on this one, I regret to say.

      2) Global politics is something that the US Constitution is indeed poorly designed for. No slight on the founders to say they failed to anticipate ICBMs and suitcase nukes. But your raising this example only reinforces my point that you’re asking the US system to operate like a European one.

  3. matthewladner says:

    We are giving it a go here in AZ. It may succeed or it may fail but it strikes me as the most responsible way to proceed. As we discussed in that previous post, Texas Education Agency Commish Scott was absolutely 100% right not to want to adopt sight unseen a few years ago, but that cuts the other way now.

    • Greg Forster says:

      It’s preferable if you can do it, but if you can’t (and most states can’t because the necessary consensus just isn’t there) then the real choice is between standards we know are bad and uncertainty. Which of those you think is worse depends on how bad you think the current standards are and how much tolerance you have for uncertainty. But there’s no reason uncertainty should be viewed as always the wrong choice.

      • matthewladner says:

        Which standards is it that we know are bad? These rankings of standards are all purely subjective are they not?

        What we know something about are the tests. Mississippi used to declare you proficient in 4th grade reading with the equivalent of a 163 on the NAEP. Now they don’t. This is far more important that any of the beauty contest stuff over my standards are prettier than yours oh no they’re not in my book.

      • Greg Forster says:

        Standards can be bad for other reasons besides not producing higher test scores. Tests are not the be all and end all metric for what is a good education. The emphasis on “informational texts” to the detriment of literature, for example, might be considered bad regardless of its effect on test scores.

      • Ze'ev Wurman says:

        And if anyone truly believes in diversity, forcing uniform standards and curricula across our huge nation is probably a mistake on its own, not even to mention issues of federalism. Children are not like electric outlets, no matter what Bill Gates thinks.

        Education should not be geared to make bureaucrats and education researchers happy, but to make parents and children happy.

      • matthewladner says:

        Yes but “might” is the key term there. There have been many cries of “Wolf!!!!!” in all of this, including the dummy down story addressed in the charts.

  4. Ze'ev Wurman says:


    As you keep making victory laps on this issue and proudly display “Honesty Gap” graphics, I point you to their main web page:

    “We are on the right road to fixing this problem. Today, many states have taken the steps needed to address the Honesty Gap – mainly, the adoption of rigorous, comparable standards and high-quality assessments that give parents real information. And this year’s results show that it’s working.”

    Rigorous standards? This issue speaks nothing about the rigor of the standards. It speaks only to test difficulty. High quality assessment? The quality of the assessment is questionable (and widely questioned). The relative uniformity of the yardstick — test difficulty — is what is being discussed, not it’s quality or precision. I find the name Honesty Gap a bit ironic in this context.

    Finally, and as I alluded to yesterday, there are two real issues here even with the uniformity of the “test difficulty” measure.

    First is that since all the PARCC states gave the same identical test in grades 4 & 8, their test difficulty should have been found all identical. Yet Education Next assigns those states grades from A to C+, some 25 NAEP percentage points span. Similarly, the SBAC states are assigned grades A through B, some 15-20 NAEP points span. What this shows is that the underlying assumption that performance distribution functions of states versus NAEP and versus their own tests are similar is *incorrect*. Consequently, much of the supposed transparency is questionable.

    Second issue is whether these test provide meaningful scores for high school kids. After all, that is really what is important, rather than the early grades. There we have no NAEP as a reference point, as shaky as it may be. But we do have the overnight more than doubling of California SBAC-declared college-ready student in Math, and overnight tripling of conditionally college-ready in English. We also have the complaint of college math chairs in Kentucky that Common Core students enter college less prepared than before. If you believe this is an improvement engendered by Common Core I have a bridge to sell you.

    So I repeat the question: How transparent and how meaningful are the grades given by the Common Core tests?

    • matthewladner says:


      Why should I care about uniformity? The states are still setting cut scores so outside of a few people huffing heavily on CC bongs in dorm room bull sessions there was no reason to ever expect uniformity.

      • Greg Forster says:

        “A few people”? I have to throw a flag on that one. Foolish as it was to expect uniformity, a lot of people did.

      • matthewladner says:

        Ok so lots of people were huffing on a CC bong in dorm rooms then eh? Two different tests turning to multiple different tests was a big hint, but from the moment that it became clear where the cut scores were being set it was obvious that there would not be uniformity. Yes many people had exaggerated notion of uniformity was a “selling point” but I can’t imagine who would be buying it or even why they thought it was important.

      • Ze'ev Wurman says:

        I will not go into arguing how much test uniformity one can (or should) realistically expect, except to note that that was the BIGGEST argument for federally sponsored national testing. If that argument was based on a lie — which you seem to indicate — then I think it is worth knowing.

        In reality, it was — and is — trivial to make a uniform yardstick with essentially ZERO effort. Cross-linking between NAEP and state tests have been around since 2005 and the feds could insist that states ALWAYS also publish such NAEP-linked results wherever state-only results are published. In an instant, all state report cards would be comparable across states with zero cost and zero effort. And it didn’t even need legislative changes — a simple ED regulation would have done.

        But the truth is that federally-sponsored tests were NEVER about comparability. In fact, I am still waiting for a single parent to approach me and complain that she can’t compare the grades of her kid in the local school with the grades of some other school across the continent. Federal tests were supposed to be from day one the stick that will force UNIFORMITY OF THE CURRICULUM across the land. Comparability was just a convenient lie.

        I, for one, am happy to break this stick in any way I can.

      • matthewladner says:

        Well that’s the problem isn’t it? In “breaking this stick any way you can” I believe that a large number of anti-CC crusaders knowingly embraced claims that fell somewhere on the speculative-absurd-outright lie spectrum. Peterson’s article blasted the credibility of those willing to engage in this sort of contemptible rabble rousing into permanent shadows.

      • Ze'ev Wurman says:

        I’d rather we avoid discussions whose claims lie on the “speculative-absurd-outright lie spectrum” but, since you started, let me answer this with a few questions.

        – Where does the argument that CC was a “state effort” lie on this spectrum?

        – Where does the argument that there was a “race to the bottom” that supposedly necessitated the Common Core lie?

        – Where does the argument that CC “does not enforce curriculum” and that federally-funded tests do not enforce curriculum lie on this spectrum, given that both consortia were paid by the feds to develop *curricular* components, and that many of their test items explicitly assess pedagogical approaches rather than essential knowledge?

        – Where on this spectrum lie the claims that CC “already increased achievement” that we have heard from ED and other CC promoters with the release of 2013 NAEP, while objecting to any possible causality to the 2015 NAEP results?

        – Where on this spectrum lies a claim such as Mike Petrilli’s before the 2015 NAEP release suggesting that declines might be associated with the recession and hitting precisely six years prior? To his credit he at least quickly withdrew it afterward.

        The Peterson article focused on a very narrow — and quite unreliable, as it turns out — issue of uniformity of test difficulty across the states. Nothing wrong with it except, perhaps, some wording issues that may mislead casual readers. How does this “blast” anyone’s credibility, and who are those “willing to engage in this sort of contemptible rabble rousing” is beyond me.

        Perhaps we read different articles.

      • matthewladner says:


        “The Devil Made Me Do It” does not work as an excuse. For instance, those of us who fight for parental choice face opponents who often show no regard for either the truth or simple logic. That does not make it either justified or wise for us to adopt their tactics.

      • Ze'ev Wurman says:


        I don’t know where did you pick up the idea that the devil made me do it. Reading this, and your response to Sandy Stotsky, I worry that, perhaps, it is something in the water in your area.

        The devil — believers in central federal command and control — did it. “It” being mediocre national standards, mediocre national tests, trampling on federalism. Having seen this play before in the 1990s, I feel I must resist. So perhaps you’re right, even if not in the way you meant it. The devil did force me to react (smile).

      • matthewladner says:

        Resist all you like but when you lie with dogs you get fleas. It would take all of five minutes to google multiple passionate assertions that would lead one to believe that the exact opposite of what we see in the charts in this post would be happening now. So far your response seems to be “look-squirrel” or “they did it first!”

        Don’t get me wrong- I wouldn’t buy a used car from more than a few CC people either. My only interest is in tests that don’t systematically deceive the public about the state of academic knowledge in our schools. I couldn’t care less what happens to CC per se. and I couldn’t care less about Fordham vs. Ze’ev standards beauty contests (Miss Texas is the best! No I like Miss Rhode Island better! FOOD FIGHT!!!!)

        As of right now in my book they got rid of a bunch of Mississippi NAEP 163 tests, and you guys have produced what looks like (for now) a transparency setback in Oklahoma. Mind you I would have greatly preferred to get rid of Mississippi 163 in a way that was more sustainable and more in keeping with American tradition and practices. Alas no one asked and we have little choice but to play through this absurd chapter.

  5. sstotsky says:

    The Ed Next folks have a serious “honesty gap” on more than the relationship of state standards to NAEP. They all chimed in for annual testing without a shred of evidence that it helps anyone–and they got it frozen into ESSA. Will cost millions that could be used in other ways to benefit kids.

    • matthewladner says:

      Dr. Stotsky-

      I am entirely open to the idea that other policy changes relate to MA improvements, and have written in the past that researchers show a strange lack of curiosity when it comes to teasing out these possibilities when it comes to MA. We for instance have a series of studies evaluating individual pieces of the Florida reform effort, but strangely most people assuming that the MA reforms were entirely due to testing without investigation. So I am with you there.

      I however took a peek at the NAEP trends for MA low-income kids from 2005 to 2015 and found gains of 9 points, 4 points, 5 points and 6 points on 4r, 8r, 4m and 8m respectively. While I agree that there is no research established link between these gains and annual testing, it strikes me as even more wildly speculative to say that these children may have been academically harmed by annual testing. It could be that the gains would have been even larger without annual testing but I can’t see any reason to think this actually happened.

      Finally while Paul and company are delightfully influential as policy wonks go, I feel entirely confident in saying that the United States Congress does not move according to their wishes with robotic like precision. I believe that federal education policy like most everything else is set as the result of a pluralistic free for all between contending interests. Ravitch’s friends in the unions paid for plenty of representation in the process for instance, but so too did many others.

  6. sstotsky says:


    In reference to a central issue you have further opened:

    “I however took a peek at the NAEP trends for MA low-income kids from 2005 to 2015 and found gains of 9 points, 4 points, 5 points and 6 points on 4r, 8r, 4m and 8m respectively. While I agree that there is no research-established link between these gains and annual testing, it strikes me as even more wildly speculative to saw that these children may have been academically harmed by annual testing. It could be that the gains would have been even larger without annual testing but I can’t see any reason to think this actually happened.”

    I don’t know if it is possible to find out from NAEP scores/trends more on what Jeff Nellhaus (now director of PARCC assessments) told me in an e-mail on Friday, February 22, 2008 (when he was still at the MA DoE) about the performance of minority groups on MCAS. I was on the state board of education at the time. He said he had originally offered these comments during a budget hearing with state legislators:

    ”I pointed out that while the gap is wide and has not narrowed very much, that the performance of various student groups has improved quite dramatically over time. For example, in 2001 only about 15% of black and Latino tenth grade scored at the proficient/advanced levels on the MCAS math test. The percentages rose to about 45% in 2007, a three-fold increase in the percent proficient/advanced. Interestingly, the current (2007) percentage of black/Latino 10th graders who are proficient/advanced (45%) is only slightly below the percentage of white students who were proficient/advanced in 2001 (50%).”

    This suggests to me that if there has been a deceleration in the % improvement for minority groups after 2007 on either MCAS or NAEP scores, it would be possible to say that annual testing harmed, not helped, them. Can you get earlier NAEP information for low-income kids (I realize the category of low-income is not identical to minority group membership)? Nellhaus is referring to MCAS performance levels, and to grade 10 only (which is the most important grade of all in the testing system). Sandra

  7. matthewladner says:

    Dr. Stosky-

    I just looked at the 4th grade reading numbers for FRL eligible kids in MA. Between 1998 (the earliest available measure) and 2005 they improved 8 points. Between 2005 and 2015 they improved 9 points.

    • Ze'ev Wurman says:

      Two observations.

      First, a back of the envelope calculation based on your numbers shows that between 1998 and 2005 FRL kids improved at ~1.15 points per year, while thereafter they slowed down to ~0.9 points/year. That’s about 25% slowdown.

      But there is a much bigger issue with ESSA’s demand for annual testing of all kids in grades 3-8. It made sense to demand this if the stated goal was reaching proficiency for ALL students, and if the scores were to be used in individual teacher evaluation. Both activities benefit from annual testing in that they offer scores for all kids and for most teachers in grades 3-8, and also allow for value-added scores that are more meaningful than status scores for those purposes.

      Yet ESSA dispensed with both national “proficiency for all” and scores’ use for teacher evaluation. In that case, why the demand for annual testing? What is wrong with grade-span testing, or with every other grade testing as Mass. used to have? Both can easily support identification of weak schools and ignoring demographic subgroups, while greatly reducing teaching to the test and time wasted on testing. Why should we accept ESSA’s *demand* for annual testing? Just to satisfy education researchers and bureaucrats?

      I guess I should note here that *states* should be able to define more demanding accountability systems that may require annual testing. Yet the federal demand that ALL states test annually, while removing the reasons for it, strikes me as wrong-headed, bureaucratic, and counter-educational. On this point I agree with Ravitch.

    • sstotsky says:

      Can you find any information for grade 8 or high school? And for demographic groups? FRL kids has begun to include almost everyone or huge numbers of schools; it has such an elastic definition. Stick with AA and Latinos. Scores are disaggregated for those groups to this day.

      • sstotsky says:

        I have some spreadsheets, mainly from NAEP, that may be grist for the mill of whoever knows how to sort through these data to find out if minority group achievement in MA decelerated after annual testing began in 2006. Most have data only on grades 4 and 8.

  8. matthewladner says:

    You can look up NAEP data here:

    In math it is clear that poor kids made large gains between 1996 and 2005, slower gains afterwards. The MA reform package simply running out of steam strikes me as a far more plausible explanation for this trend than annual testing, but both explanations are purely speculative without research.

    • sstotsky says:

      I believe that NAEP scores showed a gain in math only for minority kids–and only in grade 4. It didn’t, as I recall, show any gains in reading. Yet, minority kids in reading were improving, as were the others. As Helen Ladd concluded in her comments on a 2010 Brookings Institution paper by Thomas Dee and Brian Jacob:
      “… First, the null findings for reading indicate to me that to the extent that higher reading scores are an important goal for the country, NCLB is clearly not the right approach. That raises the obvious follow-up question: what is? “ “[T]he suggestive evidence that I have included here on Massachusetts [indicates] that states may be in a better position to promote student achievement than the federal government.”

      • sstotsky says:

        Sorry I wasn’t clearer. Minority kids in MA in reading were improving, The most important question is whether the gains carried through to high school in math and reading, but NAEP doesn’t measure high school grades very often.

  9. sstotsky says:

    Correlating NAEP data with changes in testing practices across a number of states that were doing gradespan testing before NCLB would be more illuminating than I thought. It seems that the biggest gains in MA occurred before 2007; annual testing started in 2006. But no one looked to see if minority groups were decelerating since 2005 since MA kids hit the jackpot in 2005 and kept improving in terms of overall averages.

    Tuesday, September 25, 2007
    MA Outscores Every Other State on NAEP Exams Again
    Students Rank First on Three Exams, Tie for First on Fourth

    LYNN – For the second time, Massachusetts has outscored every other state in the country on three of four National Assessment of Educational Progress (NAEP) exams, and has tied for first on the fourth, Governor Patrick announced at the Aborn Elementary School in Lynn on Tuesday.

    The only other time one state has ever ranked first on all four NAEP exams was when Massachusetts outscored the nation for the first time in 2005.

    In 2007, Massachusetts’ 4th graders ranked first nationwide on the reading and math exams, and the state’s eighth graders ranked first in math and tied for first with Montana, New Jersey and Vermont in reading. In 2005, the Commonwealth ranked first on reading at grades 4 and 8, and tied for first in mathematics at grades 4 and 8. …

    Results show that since 2005, the last year NAEP was administered, Massachusetts students made significant gains in three of the four exams: in grade 4 Math, 58 percent scored Proficient or above, up from 49 percent in 2005; in reading, 49 percent scored Proficient or above, up from 44 percent. In grade 8 math, 51 percent scored Proficient or above, up from 43 percent in 2005; in reading, 43 scored in the top two categories, down slightly from 44 percent.

    Prior to 2007, no state ever had more than 49 percent of its students score Proficient and above on any NAEP test. Massachusetts was one of five states this year to surpass the 50 percent threshold in grade 4 math, and the only state to surpass that mark in grade 8 Math….

    In reading, fourth graders had an average scaled score of 236, compared to the national average of 220; in math the state’s fourth graders averaged 252, well above the national average of 239. Eighth graders also surpassed their peers nationwide: the state’s students averaged 273 in reading, as compared to the national average of 261; and in math eighth graders averaged 298, well above the national average of 280.

    Despite the overall gains, an achievement gap was still evident in the state’s results, meaning that not all student subgroups made significant gains between 2005 and 2007. Hispanic students made some gains in grade 4, but showed flat results in grade 8; results for Black students were not statistically different in 2007 than in 2005.

    In addition, students with disabilities made gains on only grade 4 Math, and scores for limited English proficient students were flat across the board….

    • Male students in Massachusetts scored higher than female students in 2007 in math at both grades 4 and 8. At grade 4, male students had an average scaled score of 254, compared to 251 for female students. At grade 8, male students had an average scaled score of 300, compared to 296 for female students.
    • Female students outscored male students in Massachusetts in reading at both grades in 2007. Female students scored on average 238 in reading at grade 4, compared to 233 for male students. At grade 8, female students outscored male students, 278 to 269.
    • Massachusetts’ students eligible for free and reduced-price lunch made significant gains at grade 4, where their average scaled score in reading rose from 211 in 2005 to 214 in 2007, and in mathematics rose from 231 in 2005 to 237 in 2007.
    • Suburban and rural students outscored urban students on all four NAEP exams.

    More than 18,800 Massachusetts public school students from 167 schools at grade 4 and 135 schools at grade 8 took a 50-minute NAEP test in reading, mathematics, or writing (grade 8 only) in 2007. The schools that participated – and the students who were tested in these schools – are selected at random. Results for writing will be released in spring 2008.

    Additional information on NAEP is available on the Department’s Web site at:

    Click to access 07read_math.pdf

  10. Ze'ev Wurman says:

    Speculation is fine, but one should also pay attention to the data. For FRL Reading students there seems to be a pronounced instability in the last 2 administrations of NAEP in Mass. Grade 4 dropped by 5 points in 2013 and recovered 7 points in 2015 — a rather large swings. Grade 8 peaked in 2013 and stayed unchanged in 2015. With a single small exception (R8 in 2009), Reading scores for FRL have been monotonically increasing since 2003. In Math, FRL scores are more stable and show clear peaking in 2011 (4th grade) and in 2013 (8th grade) and drops of 2-3 points in 2015. Until that peaking, Math scores have been monotonically increasing since 2003. This data do not seem very supportive of the “running out of steam” speculation.

    One might add that on 2013 grade 12 NAEP, given only once in 4 years, Mass. was one of the only two states that showed declines since 2009. In fact, it was the only state that showed decline in *both* Reading and Math. No FRL scores are available.

  11. sstotsky says:

    A major point of much of the above discussion is that ESSA has frozen in testing practices, among other things, that could be argued (based on comparative data available without extensive research) to damage low-income kids. It can’t be proven definitively, as is the case with almost everything in education. But do the sponsors of ESSA care about the possibly damaging practices they froze in? Do the writers of ESSA (still publicly unknown) and the folks that paid the writers (still publicly unknown( care? No, so far as I can see. Nor does the research community tapped by Education Next.

    The bottom line should be skepticism if not cynicism about any claims emerging in the pages of a journal partially subsidized by grants from our major “benevolent philanthopist.” The next question is: Where do we go to get support for public policy-making of any kind? Allow parents and teachers to vote? That’s what we are now fighting for in MA. And the best argument by the advocates and supporters of Common Core is to try to declare the ballot question illegal. If anything shows the bankruptcy of the entire CC project, it is this battle.

    Again, no argument based on unquestioned evidence or even by higher ed experts in MA (of which there are dozens capable of participating if asked). The battle is purely political.

    • matthewladner says:

      Dr. Stotsky-

      If I am following you correctly, you suspect that annual testing harms low-income children and that somehow the Ed Next network is responsible for keeping it in federal law.

      Your suspicion may or may not be accurate but I’ve not only never seen any evidence to support it, I’ve never seen anyone voice the suspicion before now. Even if it were true that annual testing harms poor children (we currently lack any evidence or even a plausible theory to lead us to believe it does) and that somehow Paul Peterson is responsible for keeping it in place in federal law (the most casual observer of Congress would scoff at the notion) it is still the case that each an every article in Education Next ought to be judged on its own merits, or lack thereof, imo.

  12. sstotsky says:

    Matt, sorry that you see exaggeration as the mode of response in what I hoped would be a rational discussion.
    First, the fact that you claim not to have seen the argument that annual testing harms poor children is an odd one. A lot of folks (parents, teachers, unions) were voicing concerns about annual testing for years. And some of them did refer to poor children, in particular.

    Second, I have proposed the theory, and I’ve provided some evidence for it. It is not at the suspicion stage, but at the theoretical stage. The theory is quite plausible to those who know classrooms.

    Third. the burden of proof should be on those who proposed annual grade by grade testing and those who continued it. Did they have any evidence to support a policy that was NOT popular in the classroom, by its victims (teachers and parents and children).

    Fourth, you stated above that states will set cut scores. Where do you see this? Who is the state, any state, is to set the cut score? Who set them in AZ in the past? Do you know?

  13. sstotsky says:

    Last sentence had a typo. It should read: “Who in the state, any state, is to set the cut score? (Or set the cut scores in 2015?)

  14. matthewladner says:

    Authority over state assessments lies with the Arizona State Board of Education here in the cactus patch. My understanding is that they contracted with AIR for an assessment and set the cut scores. When the contract with AIR they will be free to either renew or contract with another firm.

    Most of the country has been doing annual testing since the 1990s. All states have been doing it for over a decade. If you have a plausible theory of how it harms the education of poor children and evidence to support it, feel free to share it. I’m open to the possibility that grade span testing is better than annual testing but the burden of proof lies with you if you propose to change current practice.

    • sstotsky says:

      Actually, most of the country hasn’t been doing annual grade-by-grade testing since the 1990s. We weren’t in MA, for sure. Most of the country wasn’t, either. That was why everyone refers to the “expansion” of testing under NCLB (after 2002). That was why researchers like Dee and Jacob (2010) looked at the results of NCLB (as we all know), because NCLB drastically changed accountability with lots more testing data, penalties for low AYP that look like help, etc..

      I quoted Helen Ladd’s observations just to show that it is widely known that NCLB accomplished little, if anything. So the burden of proof is not on me. It rests with those ed researchers who promoted a policy without evidence. And without looking for counter-evidence (which is common practice in medicine).

      I have proposed a plausible theory, and provided some evidence to support it. The response is what I find interesting and highly problematic. I am surprised that you are the only one responding on this matter, too. Sandra

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: