SNL Parodies Progressive Ed

January 18, 2012

For a higher quality version that I cannot embed, click here.


Why Do Reporters Get it Wrong?

January 13, 2012

It’s really frustrating, but some reporters continue to mis-represent the scholarly literature on the effects of private school choice programs.  We devoted an entire chapter in Education Myths to debunking “The Inconclusive Research Myth.”  But like an un-dead vampire that won’t die even after you’ve driven a stake through it’s heart, reporters keep repeating as fact things like the following:

Studies have generally found no clear advantage in academic achievement for students attending private
schools with vouchers.

That statement was the conclusion of the famously unreliable and partisan Center on Education Policy.  And reporter Tom Toch embraced it as an accurate summary of voucher research in his recent article in the Kappan.  What do we have to do to stop reporters from repeating this falsehood?

This blog post from Adam Emerson at the newly launched Fordham blog, Choice Words, is a great start.  Here’s a taste:

School voucher critics generally approach their job reviewing the research on school choice with unfair assumptions, and otherwise insightful commentators risk recycling old canards. This is true with Thomas Toch’s critique of vouchers in the newest edition of Kappan, which concludes that voucher programs haven’t shown enough impact to justify their position in a large-scale reform effort. Questions of scale can lead to legitimate debate, but we’ll get nowhere until we acknowledge what’s in the literature.

And Adam doesn’t even reference all of the gold standard (random assignment) research showing positive effects for students who participate in voucher programs, not to mention all of the rigorous studies finding that entire school systems improve in response to vouchers.

So why do people like Tom Toch, who’s not stupid or mean, fail to acknowledge this wealth of evidence showing benefits from voucher programs and just focus on crappy and mistaken summaries from hacks at CEP?


Beatles With Lightsabers — Simply Awesome

January 11, 2012


Anticipating Responses from Gates

January 9, 2012

Over the weekend I posted about how I thought the Gates Foundation was spinning the results of their Measuring Effective Teachers Project to suggest that the combination of student achievement gains, student surveys, and classroom observations was the best way to have a predictive measure of teacher effectiveness.  Let me anticipate some of the responses they may have:

1) They might say that they clearly admit the limitations of classroom observations and therefore are not guilty of spinning the results to inflate their importance.  They could point to p. 15 of the research paper in which they write: “When value-added data are available, classroom observations add little to the ability to predict value-added gains with other groups of students. Moreover, classroom observations are less reliable than student feedback, unless many different observations are added together.”

Response: I said in my post over the weekend that the Gates folks were careful so that nothing in the reports is technically incorrect.  The distortion of their findings comes from the emphasis and manner of presentation.  For example, the summary of findings in the research paper on p. 9 states: “Combining observation scores with evidence of student achievement gains and student feedback improved predictive power and reliability.”  Or the “key findings” in the practitioner brief on p. 5 say: “”Observations alone, even when scores from multiple observations were averaged together, were not as reliable or predictive of a teacher’s student achievement gains with another group of students as a measure that combined observations with student feedback  and achievement gains on state tests.”  Notice that these summaries of the results fail to mention the most straightforward and obvious finding: classroom observations are really expensive and cumbersome and yet do almost nothing to improve the predictiveness of student achievement-based measures of teacher quality.

And the proof that the results are being spun is that the media coverage uniformly repeats the incorrect claim that multiple measures are an important improvement on test scores alone.  Either all of the reporters are lousy and don’t understand the reports or the reporters are accurately repeating what they are being told and what they overwhelmingly see in the reports.  My money is on the latter explanation.

And further proof that the reporters are being spun is that Vicki Phillips, the Gates education chief, is quoted in the LA Times coverage mis-characterizing the findings: “Using these methods to evaluate teachers is ‘more predictive and powerful in combination than anything we have used as a proxy in the past,’ said Vicki Phillips, who directs the Gates project.”  This is just wrong.  As I pointed out in my previous post, the combined measure is no more predictive than student achievement by itself.

Lastly, the standard for fair and accurate reporting of results is not whether one could find any way to show that technically the description of findings is not false.  We should expect the most straightforward and obvious description of findings emphasized.  With the Gates folks I feel like I am repeatedly parsing what the meaning of the word “is” is.  That’s political spin, not research.

2) They might say that classroom observations are an important addition because at least they provide diagnostic information about how teachers can improve, while test scores cannot.

Response:  This may be true, but it is not a claim supported by the Gates study.  They found that all of the different classroom observation methods they tried had very weak predictive power.  You can’t provide a lot of feedback about how to improve student achievement based on instruments that are barely correlated with gains in student achievement.  In addition, they were unable to find sub-components of the classroom observation methods that were more predictive, so they can’t tell teachers that they really need to do certain things, since those things are much more strongly related to student learning gains.  Lastly, it is simply untrue that test scores cannot be diagnostic.  There are sub-components of the tests that measure learning in different aspects of the subject.  Teachers could be told to emphasize more those areas on which their students have lagged.

3) They may say that classroom observations and students surveys improve the reliability of a teacher quality measure when combined with test scores.

Response: An increase in reliability is cold comfort for a lack of predictive power.  Reliability is just an indicator of how consistent a measure is.  There are plenty of measures that are very consistent but not helpful in predicting teacher quality.  For example, if we asked students to rate how attractive their teacher was, we would probably get a very “reliable” (consistent) measure from year to year and section to section.  But that consistency would not make up for the fact that attractiveness is unlikely to help improve the prediction of effective teaching.  So, the student survey has a high amount of consistency, but who knows what that is really measuring since it is only weakly related to student learning gains.  It is consistent, but consistently wrong.  Our focus should be on the predictive power of teacher evaluations and classrooms observations and student surveys don’t really do anything to help with that (at least, not according to the Gates study).

4) They may say that classroom observations and student surveys improve on the prediction of student effort and classroom environment.

Response: As I mentioned in the post over the weekend, they don’t really have validated measures of student effort and classroom environment.  The Gates folks took a lot of flack last year for focusing on test-score gains, so they came up with some non-test score outcome measures simply by taking some of the items from the students survey where students are asked about their effort or classroom environment.  We have no idea whether they have really measured the amount of effort students exert or the quality of the classroom environment, they are just using some survey answers on those items and claiming that they have measured those “outcomes.”  The only validated outcome measure we have in the Gates study are the test score gains, so we have to focus on that.

—————————————————————————————————

The good news is that my fears about the Gates study being used to dictate what teachers do have not been realized, at least not yet.  But it wasn’t for lack of trying.  If the classroom observations had worked a little better in predicting student learning gains, I’m sure we would have heard about how teachers should run their classrooms to produce greater gains.  But the classroom observations were so much of a dud that gates education chief, Vicki Phillips, didn’t even attempt to claim that they found that drill and kill is bad or that teachers should avoid teaching to the test.

But the inability to use the classroom observations to tell teachers the “right” way of teaching is another way of saying that the classroom observations are not able to be used for diagnostic purposes.  The most straightforward reading of the Gates results is that classroom observations appear to be an expensive and ineffective dud.  But it’s hard for an organization that spends $45 million on a project to scientifically validate classroom observations to admit that it failed.   It’s hard enough for a third-party evaluator to say that, let alone an in-house study about a key aspect of the Gates policy agenda.


How the Gates Foundation Spins its Research

January 7, 2012

The Gates Foundation has released the next installment of reports in their Measuring Effective Teachers Project.  When the last report was released, I found myself in a tussle with the Gates folks and Sam Dillon at the New York Times because I noted that the study’s results didn’t actually support the finding attributed to it.  Vicki Phillips, the education chief at Gates,  told the NYT and LA Times that the study showed that “drill and kill” and “teaching to the test” hurt student achievement when the study actually found no such thing.

With the latest round of reports, the Gates folks are back to their old game of spinning their results to push policy recommendations that are actually unsupported by the data.  The main message emphasized in the new round of reports is that we need multiple measures of teacher effectiveness, not just value-added measures derived from student test scores, to make reliable and valid predictions about how effective different teachers are at improving student learning.

This is the clear thrust of the newly released Policy and Practice Brief  and Research Paper and is obviously what the reporters are being told by the Gates media people.  For example, Education Week summarizes the report as follows:

…the study indicates that the gauges that appear to make the most finely grained distinctions of teacher performance are those that incorporate many different types of information, not those that are exclusively based on test scores.

And Ed Sector says:

The findings demonstrate the importance of multiple measures of teacher evaluation: combining observation scores, student achievement gains, and student feedback provided the most reliable and predictive assessment of a teacher’s effectiveness.

But buried away on p. 51 of the Research Paper in Table 16 we see that value-added measures based on student test results — by themselves — are essentially as good or better than the much more expensive and cumbersome method of combining them with student surveys and classroom observations when it comes to predicting the effectiveness of teachers.  That is, the new Gates study actually finds that multiple measures are largely a waste of time and money when it comes to predicting the effectiveness of teachers at raising student scores in math and reading.

According to Table 16, student achievement gains correlate with the underlying value-added by teachers at .69. If the test scores are combined (with an equal weighting) with the results of a student survey and classroom observations that rate teachers according to a variety of commonly-used methods, the correlation to underlying value-added drops to be between .57 and .61.  That is, combining test scores with other measures where all measures are equally weighted actually reduces reliability.

The researchers also present the results of a criteria weighted combination of student achievement gains, student surveys, and classroom observations based on the regression coefficients of how predictive each is of student learning growth in other sections for the same teacher.  Based on this the test score gains are weighted at .729, the student survey at .179, and the classroom observations at .092.  This tells us how much more predictive test score gains are than student surveys or classroom observations.  Yet even when test score gains constitute 72.9% of the combined measure, the correlation to underlying teacher quality still ranges between .66 and .72, depending on which method is used for rating the classroom observations.  The criteria-weighted combined measure provides basically no improvement in reliability over using test score gains by themselves.

And using multiple measures does not improve our ability to distinguish between effective and ineffective teachers.  Using test scores alone the difference between the top quartile and bottom quartile teacher in producing  student value-added is .24 standard deviations in math learning growth on the state test.  If we combine test scores with student surveys and classroom observations using an equal weighting, the difference between top and bottom quartile teachers shrinks to be between .19 and .21.  If we use the criteria weights, where test scores are 72.9% of the combined measure, the gap between top and bottom teacher ranges between .22 and .25.  In short, using multiple measures does not improve our ability to distinguish between effective and ineffective teachers.

The same basic pattern of results holds true for reading, which can be seen in Table 20 on p. 55 of the report.  Combining test score measures of teacher effectiveness with student surveys and classroom observations does improve a little our ability to predict how students would answer survey items about their effort in schools as well as how they felt about their classroom environment.  But unlike test scores, which have been shown to be strong predictors of later life outcomes, I have no idea whether these survey items accurately capture what they intend or have any importance for students’ lives.

Adding the student surveys and classroom observation measures to test scores yields almost no benefits, but it adds an enormous amount of cost and effort to a system for measuring teacher effectiveness.  To get the classroom observations to be usable, the Gates researchers had to have four independent observations of those classrooms by four separate people.  If put into practice in schools that would consume an enormous amount of time and money.  In addition, administering, scoring, and combing the student survey also has real costs.

So, why are the Gates folks saying that their research shows the benefits of multiple measures of teacher effectiveness when their research actually suggests virtually no benefits to combining other measures with test scores and when there are significant costs to adding those other measures?  The simple answer is politics.  Large numbers of educators and a segment of the population find relying solely on test scores for measuring teacher effectiveness to be unpalatable, but they might tolerate a system that combined test scores with classroom observations and other measures.  Rather than using their research to explain that these common preferences for multiple measures are inconsistent with the evidence, the Gates folks want to appease this constituency so that they can put a formal system of systematically measuring teacher effectiveness in place.  The research is being spun to serve a policy agenda.

This spinning of the findings  is not just an accident or the results of a misunderstanding.  It is clearly deliberate.  Throughout the two reports Gates just released, they regularly engage in the same pattern of presenting the information. They show that the classroom observation measures by themselves have weak reliability and validity in predicting effective teachers.  But if you add the student survey and then add the test score measures, you get much better measures of effective teachers.  This pattern of presentation suggests the importance of multiple measures, since the classroom observations are strengthened when other measures are added.  The only place you find the reliability and validity of test scores by themselves is at the bottom of the Research Paper in Tables 16 and 20.  If both the lay-version and technical reports had always shown how little test scores are improved by adding student surveys and classroom observations, it would be plain that test scores alone are just about as good as multiple measures.

The Gates folks never actually inaccurately describe their results (as Vicki Phillips did with the previous report).  But they are careful to frame the findings as consistently as possible with the Gates policy agenda of pushing a formal system of measuring teacher effectiveness that involves multiple measures.  And it worked, since the reporters are repeating this inaccurate spin of their findings.

———————————————————————-

(UPDATE — For a post anticipating responses from Gates, see here.)


Teachers Matter

January 3, 2012

My friend and colleague, Marcus Winters, has a new book out on how to improve the quality of the teaching workforce.  Teachers Matter is an excellent summary of the literature on how best to recruit, train, and motivate teachers.  It’s a must-read for anyone interested in merit pay, credentialing, and teacher evaluation.  It’s a particularly good book to assign for classes that cover these subjects.  Check it out.


Terry Moe on Teacher Unions

December 21, 2011

Rick Hanushek interviews Terry Moe about his new book, Special Interest, which is the definitive, new work on teacher unions and education.


Nationalization Train Starts Going Off the Tracks

December 19, 2011

Let the in-fighting begin.

Supporters of digital learning, many of whom were among the strongest supporters of national standards, have organized in opposition to the imposition of a single test on the nation’s schools.  As it stands, the federal government is dumping several hundred million dollars on two testing consortia to develop assessments based on the federally “incentivized” Common Core standards.  A choice of two tests is not the same as a single test, but it is darn close.  It’s like the old joke where you have a choice between death or roo-roo.

The backers of digital learning organized by Innosight issued a group letter in which they express their desire for a multitude of testing options because they (finally) recognize the connection between choice and innovation:

Create a dynamic testing ecosystem, not another one-size-fits-all assessment. Rather than a single common test, the federal-funded opportunity offers the potential to create a vibrant assessment ecosystem comprised of multiple platforms, open-item banks, and multiple testing options that encourages deeper learning. An assessment ecosystem, rather than a single common test, will give states the flexibility to take advantage of innovations in digital learning over time while maintaining interoperability and comparability.

Signatories to this anti-national testing statement include Clayton M. Christensen, Michael B. Horn, Gisele Huff, Terry  Moe, Tom Vander Ark, Bob Wise, and Julie. E. Young in addition to dozens of others.

I’m not sure why backers of digital learning have taken so long to recognize the threat posed by the nationalization movement.  And I really can’t understand why some of them have been ardent supporters of national standards.  The adoption of national standards only has the possibility of having an effect if it is tightly connected to national testing and curriculum.

The “tight-loose” idea that we can nationally impose standards but allow a wide range of assessments, curricula, and teaching methods is just an empty slogan used to conceal the inevitability of nationalizing all of these aspects of the education system if the standards are to mean anything.  If we don’t have a common way of assessing, how can we be sure that everyone is adhering to the national standards?  And if the national standards are more than vague generalities, they inevitably drive  what is in the curriculum and how it must be taught.  You can have a little bit of nationalization about as much as you can be a little bit pregnant.

Despite the intellectual incoherence of some of these digital learning backers of national standards but opponents of national testing, it is nice to see the nationalization train starting to go off the tracks.  As the train moves further along and the full implications of nationalizing key aspects of the education system become more obvious to everyone, more and more people will jump that train.  Without significant coercion it will be very hard to keep everyone on board until they reach the station where standards, assessments, and curriculum are all centrally imposed.


Kim Jong Il Dies

December 19, 2011

Reports are that Kim Jong Il died of a heart attack yesterday.  I can’t be sure that Team America played no role in his passing, but I can hope that it did.  As I wrote in nominating Fasi Zaka for an Al Copeland Humanitarian Award:

…there is another essential element in the arsenal of liberty — ridicule.  Tyrants of all stripes, in addition to being monstrously cruel and evil, are also almost always laughably, pathetically, and outrageously ridiculous.

Charlie Chaplin realized this when he mocked Hitler in  The Great Dictator.  In Dr. Strangelove, Stanley Kubrick portrayed the communist leader as a weepy drunk and the war-mongering general as a paranoid suffering from ED.  South Park has portrayed Osama Bin Laden as the slapstick LooneyTunes villain, Wile E. Coyote.  The Daily Show and Colbert Report make their living off of puncturing the pomposity of politicians.  Humor may not be the best weapon against tyrants, crooks, fools, and all other kinds of politicians, but it is a very important one.

Who knows?  Maybe spot-on ridicule weighs heavily on the heart of vicious tyrants.


Christopher Hitchens Dies

December 16, 2011

I was sad to hear that Christopher Hitchens had died.  He may have gotten many things wrong, but he got the one big thing of his era right — the danger posed by radical Islam to human freedom and dignity.  Check out the video above for a sample.

All of us are deeply flawed and make many mistakes.  But great intellectuals and leaders get the big things of their time right and focus their energy on that big thing.  Abraham Lincoln made many mistakes, but he recognized the evils of slavery and the threat it posed to our Union.  Franklin Roosevelt made more than his share of blunders with the economy, but he recognized the threat posed by fascism and did everything he could to defeat it.  And of course, Christopher Hitchens’ role model, George Orwell, was mistaken about many things but he correctly identified the evils of Communism and the Totalitarianism it brings.

Hitchens was a great man in the tradition of these other great men.  May his warnings about Islamic Radicalism be heeded.

(edited for typos)