We Win Pop Culture! Also, a Podcast on Win-Win

May 2, 2013

Sci-Fi fest poster

(Guest post by Greg Forster)

In a major news development, today the Heartland Institute described JPGB as a “widely read education reform-pop culture blog.” After all these years of struggling for recognition as a major voice in the pop culture world, at long last our toil and struggle has been vindicated.

Oh, and they have this podcast I did on the Win-Win report showing that the research consistently supports school choice. If you’re, you know, into that kind of thing.

Win-Win 3.0 chart

In case you forgot what that column of zeros on the right looks like, here it is again.


Gates Gets Groovy, Invests in Mood Rings

June 19, 2012

Building on their earlier $1.4 million investment in bracelets to measure skin conductivity (sweating) as a proxy for student engagement, the Gates Foundation has decided to embark on a multi-million dollar investment in mood rings.

As you can see from their research results pictured above, the mood ring is capable of identifying a variety of student emotional states that could affect the learning environment.  Teachers need to be particularly wary of the “hungry for waffles” mood because it is sometimes followed by the “flatulence” or “full bladder” mood.

Besides, mood rings are pretty groovy.  And they can’t be any dumber than these Q Sensor bracelets.


Gates Goes Wild

June 19, 2012

Gates researchers using science to enhance student learning

Even a blind squirrel occasionally finds an acorn.  Well, Diane Ravitch, Susan Ohanion, Leonie Haimson, and their tinfoil hat crew have stumbled upon some of the craziest stuff I’ve ever heard in ed reform.  It appears the Gates Foundation has spent more than $1 million to develop Galvanic Skin Response bracelets to gauge student response to instruction as part of their Measuring Effective Teachers project.  The Galvanic Skin Response measures the electrical conductance of the skin, which varies largely due to the moisture from people’s sweat.

Stephanie Simon, a Reuters reporter, summarizes the Gates effort:

The foundation has given $1.4 million in grants to several university researchers to begin testing the devices in middle-school classrooms this fall.

The biometric bracelets, produced by a Massachusetts startup company, Affectiva Inc, send a small current across the skin and then measure subtle changes in electrical charges as the sympathetic nervous system responds to stimuli. The wireless devices have been used in pilot tests to gauge consumers’ emotional response to advertising.

Gates officials hope the devices, known as Q Sensors, can become a common classroom tool, enabling teachers to see, in real time, which kids are tuned in and which are zoned out.

Um, OK.  We’ve already written about how unreliable the Gates Foundation is in describing their own research, here and here.  And we’ve already written about how the entire project of using science to discover the best way to teach is a fool’s enterprise.

And now the Gates Foundation is extending that foolish enterprise to include measuring Galvanic Skin Response as a proxy for student engagement.  This simply will not work.  The extent to which students sweat is not a proxy for engagement or for learning.  It is probably a better proxy for whether they are seated near the heater or next to a really pretty girl (or handsome boy).

Galvanic Skin Response has already been widely used as part of the “scientific” effort to detect lying.  And as any person who actually cares about science knows — lie detectors do not work.  Sweating is no more a sign of lying than it is of student engagement.

I’m worried that the Gates Foundation is turning into a Big Bucket of Crazy.  Anyone who works for Gates should be worried about this.  Anyone who is funded by Gates should be worried about this.  If people don’t stand up and tell Gates that they are off the rails, the reputation of everyone associated with Gates will be tainted.


Coulson Returns Serve to Dorn

March 19, 2012

(Guest Post by Matthew Ladner)

Andrew Coulson has replied to Sherman Dorn on the productivity implosion chart. Turns out that I had been using an old version of the chart, and Professor Dorn has conceeded the larger point over the broad sweep of the spending and academic trends, but who doesn’t enjoy a tussle over methods?


More on Scientific Progressivism

January 19, 2011

I just wanted to add a few thoughts to my post yesterday.  Readers may be wondering what is wrong with using science to identify the best educational practices and then implementing those best practices.  If they are best, why wouldn’t we want to do them?

Let me answer by analogy.  We could use science to identify where we could get the highest return on capital.  If science can tell us where the highest returns can be found, why would we want to let markets allocate capital and potentially make a lot of mistakes?  Government could just use science and avoid all of those errors by making sure capital went to where it could best be used.

Of course, we tried this approach in the Soviet Union and it failed miserably.  The primary problem is that science is always uncertain and susceptible to corruption.  We can run models to measure returns on capital, but we have uncertainty about the models and we have uncertainty about the future.  Markets provide a reality test to scientific models by allowing us to choose among competing models and experience the consequences of choosing wisely or not.  Science can advise us, but only choice, freedom, and experience permit us to benefit from what science has to offer.

And even more dangerous is that in the absence of choice and competition among scientific models, authorities will allow their own interests or preferences to distort what they claim science has to say.  For an excellent example of this, check out the story of Lysenko and Soviet research on genetics.  For decades Soviet science was compelled to believe that environmental influences could be inherited.

Science facilitates progress through the crucible of market tests.  Science without markets facilitates stronger authoritarianism.


The Dead End of Scientific Progressivism

January 18, 2011

In Education Myths I argued that we needed to rely on science rather than our direct experience to identify effective policies.  Our eyes can mislead us, while scientific evidence has the systematic rigor to guide us more accurately.

That’s true, but I am now more aware of the opposite failing — believing that we can resolve all policy disputes and identify the “right way” to educate all children solely by relying on science.  Science has its limits.  Science cannot adjudicate among the competing values that might attract us to one educational approach over another.  Science usually tells us about outcomes for the typical or average student and cannot easily tell us about what is most effective for individual students with diverse needs.  Science is slow and uncertain, while policy and practice decisions have to be made right now whether a consensus of scientific evidence exists or not.  We should rely on science when we can but we also need to be humble about what science can and can’t address.

I was thinking about this while reflecting on the Gates Foundation’s Measuring Effective Teachers Project.  The project is an ambitious $45 million enterprise to improve the stability of value-added measures while identifying effective practices that contribute to higher value-added performance.  These are worthy goals.  The project intends to advance those goals by administering two standardized tests to students in 8 different school systems, surveying the students, and videotaping classroom lessons.

The idea is to see if combining information from the tests, survey, and classroom observations could produce more stable measures of teacher contributions to learning than is possible by just using the state test.  And since they are observing classrooms and surveying students, they can also identify certain teacher practices and techniques that might be associated with greater improvement.  The Gates folks are using science to improve the measures of student progress and to identify what makes a more effective teacher.

This is a great use of science, but there are limits to what we can expect.  When identifying practices that are more effective, we have to remember that this is just more effective for the typical student.  Different practices may be more effective for different students.  In principle science could help address this also, but even this study, with 3,000 teachers, is not nearly large enough to produce a fine-grained analysis of what kind of approach is most effective for many different kinds of kids.

My fear is that the researchers, their foundation-backers, and most-importantly, the policymaker and educator consumers of the research are insensitive to these limitations of science.  I fear that the project will identify the “right” way to teach and then it will be used to enforce that right way on everyone, even though it is highly likely that there are different “right” ways for different kids.

We already have a taste of this from the preliminary report that Gates issued last month.  Following its release Vicki Phillips, the head of education at the Gates Foundation, told the New York Times: “Teaching to the test makes your students do worse on the tests.”  Science had produced its answer — teachers should stop teaching to the test, stop drill and kill, and stop test prep (which the Gates officials and reporters used as interchangeable terms).

Unfortunately, Vicki Phillips mis-read her own Foundation’s report.  On p. 34 the correlation between test prep and value-added is positive, not negative.  If the study shows any relationship between test prep and student progress, it is that test prep contributes to higher value-added.  Let’s leave aside the fact that these were simply a series of pairwise correlations and not the sort of multivariate analysis that you would expect if you were really trying to identify effective teaching practices.  Vicki Phillips was just plain wrong in what she said.  Even worse, despite having the error pointed out, neither the Gates Foundation nor the New York Times has considered it worthwhile to post a public  correction.  Science says what I say it says.

And this is the greatest danger of a lack of humility in the application of science to public policy.  Science can be corrupted so that it simply becomes a shield disguising the policy preferences of those in authority.  How many times have you heard a school official justify a particular policy by saying that it is supported by research when in fact no such research exists?  This (mis)use of science is a way for authority figures to tell their critics, “shut up!”

But even if the Gates report had conducted multivariate analyses on effective teaching practices and even if Vicki Phillips could accurately describe the results of those analyses, the Gates project of using science to identify the “best” practices is doomed to failure.  The very nature of education is that different techniques are more effective in different kinds of situations for different kinds of kids.  Science can identify the best approach for the average student but it cannot identify the best approach for each individual student.  And if students are highly varied in their needs, which I believe they are, this is a major limitation.

But as the Gates Foundation pushes national standards with new national tests, they seem inclined to impose the “best” practices that science identified on all students.  The combination of Gates building a national infrastructure for driving educator behavior while launching a gigantic scientific effort to identify the best practices is worrisome.

There is nothing wrong with using science to inform local practice.  But science needs markets to keep it honest.  If competing educators can be informed by science, then they can pick among competing claims about what science tells us.  And they can learn from their experience whether the practices that are recommended for the typical student by science work in the particular circumstances in which they are operating.

But if the science of best educator practice is combined with a national infrastructure of standards and testing, then local actors cannot adjudicate among competing claims about what science says.  What the central authorities decide science says will be infused in the national standards and tests and all must adhere to that vision if they wish to excel along these centralized criteria.  Even if the central authority completely misunderstands what science has to say, we will all have to accept that interpretation.

I don’t mean to be overly alarmist.  Gates has a lot of sensible people working for them and there are many barriers remaining before we fully implement national standards and testing.  My concern is that the Gates Foundation is being informed by an incorrect theory of reform.  Reform does not come from science identifying the right thing to do and then a centralized authority imposing that right thing on everyone.  Progress comes from decentralized decision-makers having the freedom and motivation to choose among competing claims about what is right according to science.

(edited for typos)


Drill and Kill Kerfuffle

December 16, 2010

The reaction of New York Times reporter, Sam Dillon, and LA Times reporter, Jason Felch,  to my post on Monday about erroneous claims in their coverage of a new Gates report could not have been more different.  Felch said he would look into the issue, discovered that the claimed negative relationship between test prep and value-added was inaccurate, and is now working on a correction with his editors.

Sam Dillon took a very different tack.  His reaction was to believe that the blog post was “suggesting on the internet that I had misinterpreted an interview, and then you repeated the same thing about the Los Angeles Times. That was just a sloppy and irresponsible error.”  I’m not sure how Dillon jumps to this thin-skinned defensiveness when I clearly said I did not know where the error was made: “I don’t know whether something got lost in the translation between the researchers and Gates education chief, Vicki Phillips, or between her and Sam Dillon at the New York Times, but the article contains a false claim that needs to be corrected before it is used to push changes in education policy and practice.

But more importantly, Dillon failed to check the accuracy of the disputed claim with independent experts.  Instead, he simply reconfirmed the claim with Gates officials: “For your information, I contacted the Gates Foundation after our correspondence and asked them if I had misquoted or in any way misinterpreted either Vicki Phillips, or their report on their research. They said, ‘absolutely not, you got it exactly right.’”

He went on to call my efforts to correct the claim “pathetic, sloppy, and lazy, and by the way an insult.”  I guess Dillon thinks that being a reporter for the New York Times means never having to say you’re sorry — or consult independent experts to resolve a disputed claim.

If Dillon wasn’t going to check with independent experts, I decided that I should — just to make sure that I was right in saying that the claims in the NYT and LAT coverage were unsupported by the findings in the Gates report.

Just to review, here is what Dillon wrote in the New York Times: “One notable early finding, Ms. Phillips said, is that teachers who incessantly drill their students to prepare for standardized tests tend to have lower value-added learning gains than those who simply work their way methodically through the key concepts of literacy and mathematics.”  And here is what Jason Felch wrote in the LA Times: ““But the study found that teachers whose students said they ‘taught to the test’ were, on average, lower performers on value-added measures than their peers, not higher.”  And the correlations in the Gates report between test student reports of test prep and value-added on standardized tests were all positive: “We spend a lot of time in this class practicing for the state test.” (ρ=0.195), “I have learned a lot this year about the state test.” (ρ=0.143), “Getting ready for the state test takes a lot of time in our class.” ( ρ=0.103).  The report does not actually contain items that specifically mention “drill,”work their way methodically through the key concepts of literacy and mathematics,” or “taught to the test,” but I believe the reporters (and perhaps Gates officials) are referencing the test prep items with these phrases.

I sent links to the coverage and the Gates report to a half-dozen leading economists to ask if the claims mentioned above were supported by the findings.  The following reply from Jacob Vigdor, an economist at Duke, was fairly representative of what they said even if it was a bit more direct than most:

I looked carefully at the report and come to the same conclusion as you: these correlations are positive, not negative.  The NYT and LAT reports are both plainly inconsistent with what is written in the report.  A more accurate statement would be along the lines of “test preparation activities appear to be less important determinants of value added than [caring teachers, teacher control in the classroom, etc].”  But even this statement is subject to the caveat that pairwise correlations don’t definitively prove the importance of one factor over another.  Maybe the reporters are describing some other analysis that was not in the report (e.g., regression results that the investigators know about but do not appear in print), but even in that case they aren’t really getting the story right.  Even in that scenario, the best conclusion (given positive pairwise correlations and a hypothetically negative regression coefficient) would be that teachers who possess all these positive characteristics tend to emphasize test preparation as well.

Put another way, it’s alway good to have a caring teacher who is in control of the classroom, makes learning fun, and demands a lot of her students.  Among the teachers who share these characteristics, the best ones (in terms of value added) appear to also emphasize preparation for standardized tets.  I say “appear” because one would need a full-fledged multivariate regression analysis, and not pairwise correlations, to determine this definitively.

Another leading economist, who preferred not to be named, wrote: “I looked back over the report and I think you are absolutely right!”  I’m working on getting permission to quote others, but you get the idea.

In addition to confirming that a positive correlation for test prep items means that it contributes to value-added, not detracts from it, several of these leading economists emphasized the inappropriateness of comparing correlations to draw conclusions about whether test prep contributes to value-added any more or less than other teacher practices observed by students.  They noted that any such comparison would require a multivariate analysis and not just a series of pairwise correlations.  And they also noted that any causal claim about the relative effectiveness of test prep would require some effort to address the endogeneity of which teachers engage in more test prep.

As David Figlio, an economist at Northwestern University, put it:

You’re certainly correct here.  A positive pairwise correlation means that these behaviors are associated with higher performance on standardized tests, not lower performance.  The only way that it could be an accurate statement that test prep is causing worse outcomes would be if there was a negative coefficient on test prep in a head-to-head competition in a regression model — though even then, one would have to worry about endogeneity: maybe teachers with worse-performing students focus more on test prep, or maybe lower-performing students perceive test prep to be more oppressive (of course, this could go the other way as well.)  But that was not the purpose or intent of the report.  The report does not present this as a head-to-head comparison, but rather to take a first look at the correlates between practice measures and classroom performance.

There was no reason for this issue to have developed into the controversy that it has. The coverage contains obvious errors that should have been corrected quickly and clearly, just as Jason Felch is doing.   Tom Kane, Vicki Phillips, and other folks at Gates should have immediately issued a clarification as soon as they were alerted to the error, which was on Monday.

And while I did not know where the error occurred when I wrote the blog post on Monday, the indications now are that there was a miscommunication between the technical people who wrote the report and non-technical folks at Gates, like Vicki Phillips and the pr staff.  In other words, Sam Dillon can relax since the mistake appears to have originated within Gates (although Dillon’s subsequent defensiveness, name-calling, and failure to check with independent experts hardly bring credit to the profession of journalism).

The sooner Gates issues a public correction, the sooner we can move beyond this dispute over what is actually a sidebar in their report and focus instead on the enormously interesting project on which they’ve embarked to improve measures of teacher effectiveness.  An apology from Sam Dillon would be also nice but I’m not holding my breath.



Finland Sucks

December 7, 2010

Actually, I don’t really think so.  But if I were Diane Ravitch and looked at the trend in PISA for Finland as she looked at the trend in NAEP for New York City, I would see that Finland has declined in reading, math, and science.  And then I would (wrongly) conclude that Finland sucks and is doing things all wrong.

Table 5.1 Finland’s mean scores on reading, mathematics and science scales in PISA (p. 118)

PISA 2000 PISA 2003 PISA 2006 PISA 2009
Mean score Mean score Mean score Mean score
Reading 546 543 547 536
Mathematics 544 548 541
Science 563 554

Or perhaps if I really wanted to be like Diane Ravitch I would switch from looking at trends to levels of achievement, like when she looks at Massachusetts.  In that case, I would still think Finland is great and doing everything right.

Or maybe I could be like Diane Ravitch and switch to a different test that produced results more to my liking, like when Diane stopped paying attention to NAEP for New York City when it showed significant gains and started focusing instead on problems in the state test measures.

That’s the problem with being a manipulative propagandist.  It’s so hard to keep your story straight from one deception to another.


Is Ravitch Really A Great Historian?

November 30, 2010

Given Diane Ravitch’s clear record of selectively and misleadingly citing the evidence on current education debates, we should wonder whether her much-lauded historical work contains similar distortions.  Someone so willing to pick and choose the evidence to serve her argument about current debates may well have the same proclivity to advance her preferred historical interpretation.

Detecting how Ravitch selectively reads the current evidence is relatively easy because the full scope of current research is knowable without too much effort.  But the full set of historical evidence from which an author chooses is less easily known to a lay reader.  How can anyone beyond the handful of scholars who have reviewed the original documents on a particular subject know whether Diane Ravitch or any other historian is correctly selecting and interpreting historical evidence?

The reality is that we can’t.  Most people tend to think that a historian is good because he or she writes well and makes an argument that is generally preferred by the reader.  It’s even unreliable to fully trust the opinion of other historians when assessing the quality of historical work.  Very few historians are intimately familiar with the same material, especially if the topic is highly specialized — like the history of American education.  And among those few historians their judgment on the quality of another person’s work may be colored by their professional interests in advancing similar interpretations or hindering opposing ones.

In short, it is very hard to know whether someone is really a great historian.  It is certainly harder to know the quality of historical work than empirical social science, especially when data sets are widely available and analyses can be replicated without too much effort.

Given that it is hard to know the quality of historical work and given Diane Ravitch’s distortion of the evidence in current debates, I’m inclined to doubt the quality of her earlier historical work.  Ravitch may have changed her views on some things but I highly doubt she has changed her standards of scholarship.  So, if her scholarship is lousy now, perhaps it was lousy before.

I’d be curious to hear examples that anyone may have of where Ravitch was sloppy or misleading in her historical work.  I bet they are out there even if they are harder to discover than her current sloppy and misleading work.


What Doesn’t Work Clearinghouse

October 4, 2010

The U.S. Department of Education’s “What Works Clearinghouse” (WWC) is supposed to adjudicate the scientific validity of competing education research claims so that policymakers, reporters, practitioners, and others don’t have to strain their brains to do it themselves.  It would be much smarter for folks to exert the mental energy themselves rather than trust a government-operated truth committee to sort things out for them.

WWC makes mistakes, is subject to political manipulation, and applies arbitrary standards.  In short, what WWC says is not The Truth.  WWC is not necessarily less reliable than any other source that claims to adjudicate The Truth for you.  Everyone may make mistakes, distort results, and apply arbitrary standards.  The problem is that WWC has the official endorsement of the U.S. Department of Education, so many people fail to take their findings with the same grains of salt that they would to the findings of any other self-appointed truth committee.  And with the possibility that government money may be conditioned on WWC endorsement, WWC’s shortcomings are potentially more dangerous.

I could provide numerous examples of WWC’s mistakes, political manipulation, and arbitrariness, but for the brevity of a blog post let me illustrate my point with just a few.

First, WWC was sloppy and lazy in its recent finding that the Milwaukee voucher evaluation, led by my colleagues Pat Wolf and John Witte, failed to meet “WWC evidence standards” because “the authors do not provide evidence that the subsamples of voucher recipients and public school comparison students analyzed in this study were initially equivalent in math and reading achievement.” WWC justifies their conclusion with a helpful footnote that explains: “At the time of publication, the WWC had contacted the corresponding author for additional information regarding the equivalence of the analysis samples at baseline and no response had been received.”

But if WWC had actually bothered to read the Milwaukee reports they would have found the evidence of equivalence they were looking for.  The Milwaukee voucher evaluation that Pat and John are leading has a matched-sample research design.  In fact, the research team produced an entire report whose purpose was to demonstrate that the matching had worked and produced comparable samples. In addition, in the 3rd Year report the researchers devoted an entire section (see appendix B) to documenting the continuing equivalence of the matched samples despite some attrition of students over time.

Rather than reading the reports and examining the evidence on the comparability of the matched samples, WWC decided that the best way to determine whether the research met their standards for sample equivalence was to email John Witte and ask him.  I guess it’s all that hard work that justifies the multi-million dollar contract Mathematica receives from the U.S. Department of Education to run WWC.

As it turns out, Witte was traveling when WWC sent him the email.  When he returned he deleted their request along with a bunch of other emails without examining it closely.  But WWC took Witte’s non-response as confirmation that there was no evidence demonstrating the equivalence of the matched samples.  WWC couldn’t be bothered to contact any of the several co-authors.  They just went for their negative conclusion without further reading, thought, or effort.

I can’t prove it (and I’m sure my thought-process would not meet WWC standards), but I’ll bet that if the subject of the study was not vouchers, WWC would have been sure to read the reports closely and make extra efforts to contact co-authors before dismissing the research as failing to meet their standards.  But voucher researchers have grown accustomed to double-standards when others assess their research.  It’s just amazingly ironic to see the federally-sponsored entity charged with maintaining consistent and high standards fall so easily into their own double-standard.

Another example — I served on a WWC panel regarding school turnarounds a few years ago.  We were charged with assessing the research on how to successfully turnaround a failing school.  We quickly discovered that there was no research that met WWC’s standards on that question.  I suggested that we simply report that there is no rigorous evidence on this topic.  The staff rejected that suggestion, emphasizing that the Department of Education needed to have some evidence on effective turnaround strategies.

I have no idea why the political needs of the Department should have affected the truth committee in assessing the research, but it did.  We were told to look at non-rigorous research, including case-studies, anecdotes, and our own experience to do our best in identifying promising strategies.  It was strange — there were very tight criteria for what met WWC standards, but there were effectively no standards when it came to less rigorous research.  We just had to use our professional judgment.

We ended up endorsing some turnaround strategies (I can’t even remember what they were) but we did so based on virtually no evidence.  And this was all fine as long as we said that the conclusions were not based on research that met WWC standards.  I still don’t know what would have been wrong with simply saying that research doesn’t have much to tell us about effective turnaround strategies, but I guess that’s not the way truth committees work.  Truth committees have to provide the truth even when it is false.

The heart of the problem is that science has never depended on government-run truth committees to make progress.  It is simply not possible for the government to adjudicate the truth on disputed topics because the temptation to manipulate the answer or simply to make sloppy and lazy mistakes is all too great.  This is not a problem that is particular to the Obama Administration or to Mathematica.  My second example was from the Bush Administration when WWC was run by AIR.

The hard reality is that you can never fully rely on any authority to adjudicate the truth for you.  Yes, conflicting claims can be confusing.  Yes, it would be wonderfully convenient if someone just sorted it all out for us.  But once we give someone else the power to decide the truth on our behalf, we are prey to whatever distortions or mistakes they may make.  And since self-interest introduces distortions and the tendency to make mistakes, the government is a particularly untrustworthy entity to rely upon when it comes to government policy.

Science has always made progress by people sorting through the mess of competing, often technical, claims.  When official truth committees have intervened, it has almost always hindered scientific progress.  Remember that  it was the official truth committee that determined that Galileo was wrong.  Truth committees have taken positions on evolution, global warming, and a host of other controversial topics.  It simply doesn’t help.

We have no alternative to sorting through the evidence and trying to figure these things out ourselves.  We may rely upon the expertise of others in helping us sort out competing claims, but we should always do so with caution, since those experts may be mistaken or even deceptive.  But when the government starts weighing in as an expert, it speaks with far too much authority and can be much more coercive.  A What Works Clearinghouse simply doesn’t work.


Follow

Get every new post delivered to your Inbox.

Join 2,380 other followers