Anticipating Responses from Gates

Over the weekend I posted about how I thought the Gates Foundation was spinning the results of their Measuring Effective Teachers Project to suggest that the combination of student achievement gains, student surveys, and classroom observations was the best way to have a predictive measure of teacher effectiveness.  Let me anticipate some of the responses they may have:

1) They might say that they clearly admit the limitations of classroom observations and therefore are not guilty of spinning the results to inflate their importance.  They could point to p. 15 of the research paper in which they write: “When value-added data are available, classroom observations add little to the ability to predict value-added gains with other groups of students. Moreover, classroom observations are less reliable than student feedback, unless many different observations are added together.”

Response: I said in my post over the weekend that the Gates folks were careful so that nothing in the reports is technically incorrect.  The distortion of their findings comes from the emphasis and manner of presentation.  For example, the summary of findings in the research paper on p. 9 states: “Combining observation scores with evidence of student achievement gains and student feedback improved predictive power and reliability.”  Or the “key findings” in the practitioner brief on p. 5 say: “”Observations alone, even when scores from multiple observations were averaged together, were not as reliable or predictive of a teacher’s student achievement gains with another group of students as a measure that combined observations with student feedback  and achievement gains on state tests.”  Notice that these summaries of the results fail to mention the most straightforward and obvious finding: classroom observations are really expensive and cumbersome and yet do almost nothing to improve the predictiveness of student achievement-based measures of teacher quality.

And the proof that the results are being spun is that the media coverage uniformly repeats the incorrect claim that multiple measures are an important improvement on test scores alone.  Either all of the reporters are lousy and don’t understand the reports or the reporters are accurately repeating what they are being told and what they overwhelmingly see in the reports.  My money is on the latter explanation.

And further proof that the reporters are being spun is that Vicki Phillips, the Gates education chief, is quoted in the LA Times coverage mis-characterizing the findings: “Using these methods to evaluate teachers is ‘more predictive and powerful in combination than anything we have used as a proxy in the past,’ said Vicki Phillips, who directs the Gates project.”  This is just wrong.  As I pointed out in my previous post, the combined measure is no more predictive than student achievement by itself.

Lastly, the standard for fair and accurate reporting of results is not whether one could find any way to show that technically the description of findings is not false.  We should expect the most straightforward and obvious description of findings emphasized.  With the Gates folks I feel like I am repeatedly parsing what the meaning of the word “is” is.  That’s political spin, not research.

2) They might say that classroom observations are an important addition because at least they provide diagnostic information about how teachers can improve, while test scores cannot.

Response:  This may be true, but it is not a claim supported by the Gates study.  They found that all of the different classroom observation methods they tried had very weak predictive power.  You can’t provide a lot of feedback about how to improve student achievement based on instruments that are barely correlated with gains in student achievement.  In addition, they were unable to find sub-components of the classroom observation methods that were more predictive, so they can’t tell teachers that they really need to do certain things, since those things are much more strongly related to student learning gains.  Lastly, it is simply untrue that test scores cannot be diagnostic.  There are sub-components of the tests that measure learning in different aspects of the subject.  Teachers could be told to emphasize more those areas on which their students have lagged.

3) They may say that classroom observations and students surveys improve the reliability of a teacher quality measure when combined with test scores.

Response: An increase in reliability is cold comfort for a lack of predictive power.  Reliability is just an indicator of how consistent a measure is.  There are plenty of measures that are very consistent but not helpful in predicting teacher quality.  For example, if we asked students to rate how attractive their teacher was, we would probably get a very “reliable” (consistent) measure from year to year and section to section.  But that consistency would not make up for the fact that attractiveness is unlikely to help improve the prediction of effective teaching.  So, the student survey has a high amount of consistency, but who knows what that is really measuring since it is only weakly related to student learning gains.  It is consistent, but consistently wrong.  Our focus should be on the predictive power of teacher evaluations and classrooms observations and student surveys don’t really do anything to help with that (at least, not according to the Gates study).

4) They may say that classroom observations and student surveys improve on the prediction of student effort and classroom environment.

Response: As I mentioned in the post over the weekend, they don’t really have validated measures of student effort and classroom environment.  The Gates folks took a lot of flack last year for focusing on test-score gains, so they came up with some non-test score outcome measures simply by taking some of the items from the students survey where students are asked about their effort or classroom environment.  We have no idea whether they have really measured the amount of effort students exert or the quality of the classroom environment, they are just using some survey answers on those items and claiming that they have measured those “outcomes.”  The only validated outcome measure we have in the Gates study are the test score gains, so we have to focus on that.


The good news is that my fears about the Gates study being used to dictate what teachers do have not been realized, at least not yet.  But it wasn’t for lack of trying.  If the classroom observations had worked a little better in predicting student learning gains, I’m sure we would have heard about how teachers should run their classrooms to produce greater gains.  But the classroom observations were so much of a dud that gates education chief, Vicki Phillips, didn’t even attempt to claim that they found that drill and kill is bad or that teachers should avoid teaching to the test.

But the inability to use the classroom observations to tell teachers the “right” way of teaching is another way of saying that the classroom observations are not able to be used for diagnostic purposes.  The most straightforward reading of the Gates results is that classroom observations appear to be an expensive and ineffective dud.  But it’s hard for an organization that spends $45 million on a project to scientifically validate classroom observations to admit that it failed.   It’s hard enough for a third-party evaluator to say that, let alone an in-house study about a key aspect of the Gates policy agenda.

12 Responses to Anticipating Responses from Gates

  1. Joe in LA - slowly leaving the Republican Party says:

    1) Why do you NEVER address student or parent behavior. My so-called “effectiveness” is more determined by them than by me.

    2) Observations are necessary NOT to validate your need for data, but because that’s the mode of interaction through which teachers will improve.

    3) Most teachers will tell you of course observations add nothing to the validity of the data because the observers are backboneless fools, and incompetents a.k.a. administrators. THEY are major contributors to the failings of schools. But why would anyone in the media know this; you spend no time in the trenches. Its like you’re watching the old M*A*S*H* tv show and evaluating the doctors on whether or not their patients leave prepared to get admitted to Princeton. (The mixed metaphor is intentional.) YOU’RE ON THE WRONG SIDE! You’re attacking the only glue holding the place together!!! Email me if you want to learn the truth about public schools. My email addy is in your blog’s system.

    • Hi Joe — In the last two posts I’m not addressing the wisdom of using test scores to measure student learning. I’m just focusing on whether the Gates folks are accurately describing their own results. Their results use test scores to measure student learning, so I am working within that framework.

      • Joe in LA - slowly leaving the Republican Party says:

        1) Like the Gates report you are technically correct but disingenuous. Your criticism, by implication, leaves the impression that value-added methods should be the sole method of evaluation.

        2) As you and Gates repeatedly ignore the real sources of our sc hools’ demise, and focus on the largely irrelevant should I fail to point that out (and are you excused from defending the omission) becuse it lies outside your framework?

    • Mike Mcdonald says:

      Agreed. Teachers are an easy target, and in an impossible situation. Why are talented young people going to want to become educators again? Can someone remind me? It seems like a pretty thankless job from the outside. (Of course those who have experience know the amazing rewards in being a teacher, however those rewards are not monetary, which draws the prospective employees)

      • Joe in LA - slowly leaving the Republican Party says:

        The non-monetary rewards loose efficacy rather quickly. That is why you see a large number of teachers leave teaching in their 3-5th year teaching.

      • Danaher M. Dempsey, Jr. says:

        Joe in LA,

        The current “ED” USA situation is such that direction from leadership seems haphazard and hardly based on evidence. It is hard to be young, underpaid, and in a career that is apparently headed nowhere.

        The Race to the Top is not based on evidence of what works best. The Common Core State Standards are not internationally benchmarked. Most of the nation has a reality gap, when it comes to education.

        We still have Colleges of Education instructing future teachers on methods that the CoE would like to have work …. rather than providing instruction on teaching practices that are known to work. …. The frustration of being locked into unproductive systems … is another reason to leave.

  2. Danaher M. Dempsey, Jr. says:

    So while Gates Foundation spent $45 million to Gather feedback on Effective Teaching……

    It appears the Foundation has little interest in whether materials and instructional practices prescribed by a District influence student achievement.

    Notice that the Districts used in the study were almost all current or former users of Everyday Math. ……… I would not be spending so much time on teachers actions and results … when most of the k-5 teachers are forced to use substandard materials.
    Districts in the study :::

    Charlotte-Mecklenberg – EDM is the text

    Dallas — was using EDM then came the Nov 2007 decision from the State on EDM use.
    The vote leaves some doors open for Everyday Math. As long as Texas districts use their own money, and none from the state, they can still purchase it, and they can still use state funds to purchase first, second, fourth, and fifth-grade Everyday Math textbooks.

    Denver – In the Rocky Mountain News April 2007 came the report that EDM and CMP were being used and that CMP was a huge flop in Denver middle schools.

    Hillsborough – ???

    Memphis — EDM

    New York City — EDM became the district’s choice when Bloomberg took over in 2003

    Pittsburgh – EDM used since 1993-1994

    — Dan

    • Student of History says:


      Did you see the story within the last day or so that the Dana Center would be preparing the math learning tasks for the PARCC consortium?

      Time to get that film back out on how to mislead parents starring Treisman.

      • Danaher M. Dempsey, Jr. says:

        Yes Student of History …. Dana Center in WA State has quite a reputation. The SPI “gave” as in awarded Treisman and friends the “Extremely” high bid contract to develop the new WA 2008 Math Standards… Then Dana Center failed and the legislature passed the math standards task to Strategic Teaching….. Yet in the WA Legislature’s 2010 discussions … Stand for Children lauded the Dana Center for its math work.

  3. Danaher M. Dempsey, Jr. says:

    The Smarter Balanced Assessment consortium is led by Joe Willhoft, who was a WA SPI mainstay during the math mess with Dana. So with PARCC on Board with Dana …. the Common Core State Standards are likely to be really lousy of the Assessment end.

    • Student of History says:

      The assessments are the whole point of Common Core just like the New Standards Project was a big part of Goals 2000. If you push the Goals 2000 policies and practices with any kind of objective test of knowledge, even a weak CRCT with low levels of proficiency, you get the Atlanta cheating scandal.

      • Danaher M. Dempsey, Jr. says:

        About the the Atlanta cheating scandal.

        I believe there is a lot more cheating taking place than is ever discovered or reported. Just because it never made the news …. doesn’t mean it never happened.

        Most incest is never reported…. lots of things are similar.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: