Drill and Kill Kerfuffle

The reaction of New York Times reporter, Sam Dillon, and LA Times reporter, Jason Felch,  to my post on Monday about erroneous claims in their coverage of a new Gates report could not have been more different.  Felch said he would look into the issue, discovered that the claimed negative relationship between test prep and value-added was inaccurate, and is now working on a correction with his editors.

Sam Dillon took a very different tack.  His reaction was to believe that the blog post was “suggesting on the internet that I had misinterpreted an interview, and then you repeated the same thing about the Los Angeles Times. That was just a sloppy and irresponsible error.”  I’m not sure how Dillon jumps to this thin-skinned defensiveness when I clearly said I did not know where the error was made: “I don’t know whether something got lost in the translation between the researchers and Gates education chief, Vicki Phillips, or between her and Sam Dillon at the New York Times, but the article contains a false claim that needs to be corrected before it is used to push changes in education policy and practice.

But more importantly, Dillon failed to check the accuracy of the disputed claim with independent experts.  Instead, he simply reconfirmed the claim with Gates officials: “For your information, I contacted the Gates Foundation after our correspondence and asked them if I had misquoted or in any way misinterpreted either Vicki Phillips, or their report on their research. They said, ‘absolutely not, you got it exactly right.'”

He went on to call my efforts to correct the claim “pathetic, sloppy, and lazy, and by the way an insult.”  I guess Dillon thinks that being a reporter for the New York Times means never having to say you’re sorry — or consult independent experts to resolve a disputed claim.

If Dillon wasn’t going to check with independent experts, I decided that I should — just to make sure that I was right in saying that the claims in the NYT and LAT coverage were unsupported by the findings in the Gates report.

Just to review, here is what Dillon wrote in the New York Times: “One notable early finding, Ms. Phillips said, is that teachers who incessantly drill their students to prepare for standardized tests tend to have lower value-added learning gains than those who simply work their way methodically through the key concepts of literacy and mathematics.”  And here is what Jason Felch wrote in the LA Times: ““But the study found that teachers whose students said they ‘taught to the test’ were, on average, lower performers on value-added measures than their peers, not higher.”  And the correlations in the Gates report between test student reports of test prep and value-added on standardized tests were all positive: “We spend a lot of time in this class practicing for the state test.” (ρ=0.195), “I have learned a lot this year about the state test.” (ρ=0.143), “Getting ready for the state test takes a lot of time in our class.” ( ρ=0.103).  The report does not actually contain items that specifically mention “drill,”work their way methodically through the key concepts of literacy and mathematics,” or “taught to the test,” but I believe the reporters (and perhaps Gates officials) are referencing the test prep items with these phrases.

I sent links to the coverage and the Gates report to a half-dozen leading economists to ask if the claims mentioned above were supported by the findings.  The following reply from Jacob Vigdor, an economist at Duke, was fairly representative of what they said even if it was a bit more direct than most:

I looked carefully at the report and come to the same conclusion as you: these correlations are positive, not negative.  The NYT and LAT reports are both plainly inconsistent with what is written in the report.  A more accurate statement would be along the lines of “test preparation activities appear to be less important determinants of value added than [caring teachers, teacher control in the classroom, etc].”  But even this statement is subject to the caveat that pairwise correlations don’t definitively prove the importance of one factor over another.  Maybe the reporters are describing some other analysis that was not in the report (e.g., regression results that the investigators know about but do not appear in print), but even in that case they aren’t really getting the story right.  Even in that scenario, the best conclusion (given positive pairwise correlations and a hypothetically negative regression coefficient) would be that teachers who possess all these positive characteristics tend to emphasize test preparation as well.

Put another way, it’s alway good to have a caring teacher who is in control of the classroom, makes learning fun, and demands a lot of her students.  Among the teachers who share these characteristics, the best ones (in terms of value added) appear to also emphasize preparation for standardized tets.  I say “appear” because one would need a full-fledged multivariate regression analysis, and not pairwise correlations, to determine this definitively.

Another leading economist, who preferred not to be named, wrote: “I looked back over the report and I think you are absolutely right!”  I’m working on getting permission to quote others, but you get the idea.

In addition to confirming that a positive correlation for test prep items means that it contributes to value-added, not detracts from it, several of these leading economists emphasized the inappropriateness of comparing correlations to draw conclusions about whether test prep contributes to value-added any more or less than other teacher practices observed by students.  They noted that any such comparison would require a multivariate analysis and not just a series of pairwise correlations.  And they also noted that any causal claim about the relative effectiveness of test prep would require some effort to address the endogeneity of which teachers engage in more test prep.

As David Figlio, an economist at Northwestern University, put it:

You’re certainly correct here.  A positive pairwise correlation means that these behaviors are associated with higher performance on standardized tests, not lower performance.  The only way that it could be an accurate statement that test prep is causing worse outcomes would be if there was a negative coefficient on test prep in a head-to-head competition in a regression model — though even then, one would have to worry about endogeneity: maybe teachers with worse-performing students focus more on test prep, or maybe lower-performing students perceive test prep to be more oppressive (of course, this could go the other way as well.)  But that was not the purpose or intent of the report.  The report does not present this as a head-to-head comparison, but rather to take a first look at the correlates between practice measures and classroom performance.

There was no reason for this issue to have developed into the controversy that it has. The coverage contains obvious errors that should have been corrected quickly and clearly, just as Jason Felch is doing.   Tom Kane, Vicki Phillips, and other folks at Gates should have immediately issued a clarification as soon as they were alerted to the error, which was on Monday.

And while I did not know where the error occurred when I wrote the blog post on Monday, the indications now are that there was a miscommunication between the technical people who wrote the report and non-technical folks at Gates, like Vicki Phillips and the pr staff.  In other words, Sam Dillon can relax since the mistake appears to have originated within Gates (although Dillon’s subsequent defensiveness, name-calling, and failure to check with independent experts hardly bring credit to the profession of journalism).

The sooner Gates issues a public correction, the sooner we can move beyond this dispute over what is actually a sidebar in their report and focus instead on the enormously interesting project on which they’ve embarked to improve measures of teacher effectiveness.  An apology from Sam Dillon would be also nice but I’m not holding my breath.


15 Responses to Drill and Kill Kerfuffle

  1. Student of History says:

    Jay-

    You might want to read Paul Kengor’s new book “Dupes” paying particular attention to the New York Times’ repeated history of not wanting to check facts or correct errors when the story pushes a view they want to push.

    I think 2011 will be the year many of us involved in education start recognizing how much the “Dupers” and the “Duped” phenomenon is going on in education in the US.

  2. matthewladner says:

    Game.Set.Match

  3. concerned says:

    Jay,

    Thanks for the post!
    Please ask George Andrews to comment on the report. He is particularly interested in mathematics education research.

    Click to access Monthly.pdf

    Inquiring minds want to know his opinion!

    PS – many of us prefer the phrase “Thrill of Skill”

  4. Patrick says:

    Warning: About journalists, not education…

    I’ve noticed that journalists can be very, very touchy. I’ve had more than one nasty email from journalists in Las Vegas after I politely corrected their stories.

    One series of articles in the Las Vegas Sun on hospitals concluded that Nevada’s hospitals are putting profits before patients (the bulk of hospitals in Nevada are private-for profit which is the opposite of the rest of the nation).

    The only data the researcher had were anecdotal stories from patients and family members of patients who suffered unnecessary trauma. Ironically, the only data provided in the article suggested that the local government run hospital had some unacceptable mortality figures for certain procedures. There was no hard data given on the for-profits.

    I looked up mortality data and patient satisfaction surveys on hospitals in Nevada and found that the U.S. Health and Human Services department found no statistically significant difference between hospitals in southern Nevada and the national average when it came to mortality, readmission rates AND patient satisfaction surveys.

    I sent the information to the reporter and he sent me back a nasty email telling me I’m “ignorant” and that since he did 150 interviews with doctors and patients – who all tend to have some sort of problem with the system – that he knows what he’s talking about.

    I blogged about it here: http://prgibbons.blogspot.com/2010/11/marshall-plan.html

    and told him that by basing an unsupported conclusion on the testimony of people who had problems with the system he’s guilty of confirmation bias, non causa pro causa, and post hoc ergo propter hoc. I never attacked his professionalism – but that is the only thing he was concerned about.

    He never retracted his personal attack, but the next article noted that Nevada’s hospitals, while facing some problems, are not statistically different than the rest of the nation.

  5. Patrick says:

    That was really long, sorry about that.

  6. Daniel Earley says:

    Unfortunately, few journalists and PR folks know how to use “multivariate regression analysis” correctly in a sentence, let alone why it matters. Ignorance and illiteracy in the fundamentals of study design and analysis form a substantial and vexing barrier to a rational dialog… hence the premature gray in my hair.

  7. You may well be right, Daniel, but the researchers who conducted the study know better. They should be pushing Vicki Phillips at Gates to issue a correction so that the reporters at the NYT, LAT, and the people in general know that the claims she made are not supported by their research.

    Instead, they seem to be pushing the line that this statement in the NYT, “‘Teaching to the test makes your students do worse on the tests,’ Ms. Phillips said,” is not far from the truth because the coefficients for test prep were generally smaller than other observed practices. As indicated in the post above, all of the experts with whom I consulted say that this is also an incorrect claim. It is not possible to compare the size of a series of pairwise correlations to determine the relative magnitude of each practice’s effect on value-added.

    Again, reporters and pr people would have a harder time knowing this, but the researchers who conducted the study should know better.

  8. Daniel Earley says:

    Indeed, Jay… indeed. Of course, this begs the question: Is the blind eye intentional? If so, for what motive? Not that we need to answer such questions, but I believe their continued silence makes it fair to ask.

  9. Florida Speaks says:

    Thanks for quoting Figlio. He also authored a study of Florida pushing students out of school which would rid the school of their test scores as well as one about the lack of improved performance when a public school student went to a voucher school.

  10. […] already have a taste of this from the preliminary report that Gates issued last month.  Following its release Vicki Phillips, the head of education at the Gates Foundation, told the […]

  11. Galton says:

    How should a professional scientist react when a study of their’s is misinterpreted? Someone should send the answer to Mr. Kane, Ms. Gates, and the folks at Harvard.

    • As it turns out, the reporters did not misinterpret the study. The Gates folks misled the reporters. We know this because the quotation from Vicki Phillips, the head of ed at Gates, contained the incorrect interpretation of the results. We also know this because the LA Times, Ed Week, and the NYT all made the same mistake — that is, they were all told the same wrong thing.

  12. James S. says:

    To be statistically accurate, it is not sufficient to report the correlations themselves, but rather, you must report both the correlation as well as a test of the significance of that correlation (e.g., a p-value or confidence interval). It is possible, for example, to have a positive correlation such as rho=0.125 but also have a 95% confidence interval that puts rho somewhere in the range of [-0.2, 0.45]. If this were the case, even though the correlation was in fact “positive” it is not statistically different from no correlation at all. In fact, it may even be negative. I don’t personally care about the issue here between Gates and the NYT — but you’ve written an article to be critical of reporting standards, so please correct this statistical deficiency. Tests for the significance of correlation statistics are part of high school statistics courses.

Leave a reply to Patrick Cancel reply