Understanding the Gates Foundation’s Measuring Effective Teachers Project

January 9, 2013

If I were running a school I’d probably want to evaluate teachers using a mixture of student test score gains, classroom observations, and feedback from parents, students, and other staff.  But I recognize that different schools have different missions and styles that can best be assessed using different methods.  I wouldn’t want to impose on all schools in a state or the nation a single, mechanistic system for evaluating teachers since that is likely to be a one size fits none solution.  There is no single best way to evaluate teachers, just like there is no single best way to educate students.

But the folks at the Gates Foundation, afflicted with PLDD, don’t see things this way.  They’ve been working with politicians in Illinois, Los Angeles, and elsewhere to centrally impose teacher evaluation systems, but they’ve encountered stiff resistance.  In particular, they’ve noticed that teachers and others have expressed strong reservations about any evaluation system that relies too heavily on student test scores.

So the folks at Gates have been trying to scientifically validate a teacher evaluation system that involves a mix of test score gains, classroom observations, and student surveys so that they can overcome resistance to centrally imposed, mechanistic evaluation systems.  If they can reduce reliance on test scores in that system while still carrying the endorsement of “science,” the Gates folk imagine  that politicians, educators, and others will all embrace the Gates central planning fantasy.

Let’s leave aside for the moment the political reality, demonstrated recently in Chicago and Los Angeles, that teachers are likely to fiercely resist any centrally imposed, mechanistic evaluation system regardless of the extent to which it relies on test scores.  The Gates folks want to put on their lab coats and throw the authority of science behind a particular approach to teacher evaluation.  If you oppose it you might as well deny global warming.  Science has spoken.

So it is no accident that the release of the third and final round of reports from the Gates Foundation’s Measuring Effective Teachers project was greeted with the following headline in the Washington Post: “Gates Foundation study: We’ve figured out what makes a good teacher,”  or this similarly humble claim in the Denver Post: “Denver schools, Gates foundation identify what makes effective teacher.”  This is the reaction that the Gates Foundation was going for — we’ve used science to discover the correct formula for evaluating teachers.  And by implication, we now know how to train and improve teachers by using the scientifically validated methods of teaching.

The only problem is that things didn’t work out as the Gates folks had planned.  Classroom observations make virtually no independent contribution to the predictive power of a teacher evaluation system.  You have to dig to find this, but it’s right there in Table 1 on page 10 of one of the technical reports released yesterday.  In a regression to predict student test score gains using out of sample test score gains for the same teacher, student survey results, and classroom observations, there is virtually no relationship between test score gains and either classroom observations or student survey results.  In only 3 of the 8 models presented is there any statistically significant relationship between either classroom observations or student surveys and test score gains (I’m excluding the 2 instances were they report p < .1 as statistically significant).  And in all 8 models the point estimates suggest that a standard deviation improvement in classroom observation or student survey results is associated with less than a .1 standard deviation increase in test score gains.

Not surprisingly, a composite teacher evaluation measure that mixes classroom observations and student survey results with test score gains is generally no better and sometimes much worse at predicting out of sample test score gains.  The Gates folks trumpet the finding that the combined measures are more “reliable” but that only means that they are less variable, not any more predictive.

But “the best mix” according to the “policy and practitioner brief” is “a composite with weights between 33 percent and 50 percent assigned to state test scores.”  How do they know this is the “best mix?”  It generally isn’t any better at predicting test score gains.  And to collect the classroom observations involves an enormous expense and hassle.  To get the measure as “reliable” as they did without sacrificing too much predictive power, the Gates team had to observe each teacher at least four different times by at least two different coders, including one coder outside of the school.  To observe 3.2 million public school teachers for four hours by staff compensated at $40 per hour would cost more than $500 million each year.  The Gates people also had to train the observers at least 17 hours and even after that had to throw out almost a quarter of those observers as unreliable.  To do all of this might cost about $1 billion each year.

And what would we get for this billion?  Well, we might get more consistent teacher evaluation scores, but we’d get basically no improvement in the identification of effective teachers.  And that’s the “best mix?”  Best for what?  It’s best for the political packaging of a centrally imposed, mechanistic teacher evaluation system, which is what this is all really about.  Vicki Phillips, who heads the Gates education efforts, captured in this comment what I think they are really going for with a composite evaluation score:

Combining all three measures into a properly weighted index, however, produced a result “teachers can trust,” said Vicki Phillips, a director in the education program at the Gates Foundation.

It’ll cost a fortune, it doesn’t improve the identification of effective teachers, but we need to do it to overcome resistance from teachers and others.  Not only will this not work, but in spinning the research as they have, the Gates Foundation is clearly distorting the straightforward interpretation of their findings: a mechanistic system of classroom observation provides virtually nothing for its enormous cost and hassle.  Oh, and this is the case when no stakes were attached to the classroom observations.  Once we attach all of this to pay or continued employment, their classroom observation system will only get worse.

I should add that if classroom observations aren’t useful as predictors, they also can’t be used effectively for diagnostic purposes.  An earlier promise of this project is that they would figure out which teacher evaluation rubrics were best and which sub-components of those rubrics that were most predictive of effective teaching.  But that clearly hasn’t panned out.  In the new reports I can’t find anything about the diagnostic potential of classroom observations, which is not surprising since those observations are not predictive.

So, rather than having “figured out what makes a good teacher” the Gates Foundation has learned very little in this project about effective teaching practices.  The project was an expensive flop.  Let’s not compound the error by adopting this expensive flop as the basis for centrally imposed, mechanistic teacher evaluation systems nationwide.

(Edited for typos and to add links.  To see a follow-up post, click here.)

How the Gates Foundation Spins its Research

January 7, 2012

The Gates Foundation has released the next installment of reports in their Measuring Effective Teachers Project.  When the last report was released, I found myself in a tussle with the Gates folks and Sam Dillon at the New York Times because I noted that the study’s results didn’t actually support the finding attributed to it.  Vicki Phillips, the education chief at Gates,  told the NYT and LA Times that the study showed that “drill and kill” and “teaching to the test” hurt student achievement when the study actually found no such thing.

With the latest round of reports, the Gates folks are back to their old game of spinning their results to push policy recommendations that are actually unsupported by the data.  The main message emphasized in the new round of reports is that we need multiple measures of teacher effectiveness, not just value-added measures derived from student test scores, to make reliable and valid predictions about how effective different teachers are at improving student learning.

This is the clear thrust of the newly released Policy and Practice Brief  and Research Paper and is obviously what the reporters are being told by the Gates media people.  For example, Education Week summarizes the report as follows:

…the study indicates that the gauges that appear to make the most finely grained distinctions of teacher performance are those that incorporate many different types of information, not those that are exclusively based on test scores.

And Ed Sector says:

The findings demonstrate the importance of multiple measures of teacher evaluation: combining observation scores, student achievement gains, and student feedback provided the most reliable and predictive assessment of a teacher’s effectiveness.

But buried away on p. 51 of the Research Paper in Table 16 we see that value-added measures based on student test results — by themselves — are essentially as good or better than the much more expensive and cumbersome method of combining them with student surveys and classroom observations when it comes to predicting the effectiveness of teachers.  That is, the new Gates study actually finds that multiple measures are largely a waste of time and money when it comes to predicting the effectiveness of teachers at raising student scores in math and reading.

According to Table 16, student achievement gains correlate with the underlying value-added by teachers at .69. If the test scores are combined (with an equal weighting) with the results of a student survey and classroom observations that rate teachers according to a variety of commonly-used methods, the correlation to underlying value-added drops to be between .57 and .61.  That is, combining test scores with other measures where all measures are equally weighted actually reduces reliability.

The researchers also present the results of a criteria weighted combination of student achievement gains, student surveys, and classroom observations based on the regression coefficients of how predictive each is of student learning growth in other sections for the same teacher.  Based on this the test score gains are weighted at .729, the student survey at .179, and the classroom observations at .092.  This tells us how much more predictive test score gains are than student surveys or classroom observations.  Yet even when test score gains constitute 72.9% of the combined measure, the correlation to underlying teacher quality still ranges between .66 and .72, depending on which method is used for rating the classroom observations.  The criteria-weighted combined measure provides basically no improvement in reliability over using test score gains by themselves.

And using multiple measures does not improve our ability to distinguish between effective and ineffective teachers.  Using test scores alone the difference between the top quartile and bottom quartile teacher in producing  student value-added is .24 standard deviations in math learning growth on the state test.  If we combine test scores with student surveys and classroom observations using an equal weighting, the difference between top and bottom quartile teachers shrinks to be between .19 and .21.  If we use the criteria weights, where test scores are 72.9% of the combined measure, the gap between top and bottom teacher ranges between .22 and .25.  In short, using multiple measures does not improve our ability to distinguish between effective and ineffective teachers.

The same basic pattern of results holds true for reading, which can be seen in Table 20 on p. 55 of the report.  Combining test score measures of teacher effectiveness with student surveys and classroom observations does improve a little our ability to predict how students would answer survey items about their effort in schools as well as how they felt about their classroom environment.  But unlike test scores, which have been shown to be strong predictors of later life outcomes, I have no idea whether these survey items accurately capture what they intend or have any importance for students’ lives.

Adding the student surveys and classroom observation measures to test scores yields almost no benefits, but it adds an enormous amount of cost and effort to a system for measuring teacher effectiveness.  To get the classroom observations to be usable, the Gates researchers had to have four independent observations of those classrooms by four separate people.  If put into practice in schools that would consume an enormous amount of time and money.  In addition, administering, scoring, and combing the student survey also has real costs.

So, why are the Gates folks saying that their research shows the benefits of multiple measures of teacher effectiveness when their research actually suggests virtually no benefits to combining other measures with test scores and when there are significant costs to adding those other measures?  The simple answer is politics.  Large numbers of educators and a segment of the population find relying solely on test scores for measuring teacher effectiveness to be unpalatable, but they might tolerate a system that combined test scores with classroom observations and other measures.  Rather than using their research to explain that these common preferences for multiple measures are inconsistent with the evidence, the Gates folks want to appease this constituency so that they can put a formal system of systematically measuring teacher effectiveness in place.  The research is being spun to serve a policy agenda.

This spinning of the findings  is not just an accident or the results of a misunderstanding.  It is clearly deliberate.  Throughout the two reports Gates just released, they regularly engage in the same pattern of presenting the information. They show that the classroom observation measures by themselves have weak reliability and validity in predicting effective teachers.  But if you add the student survey and then add the test score measures, you get much better measures of effective teachers.  This pattern of presentation suggests the importance of multiple measures, since the classroom observations are strengthened when other measures are added.  The only place you find the reliability and validity of test scores by themselves is at the bottom of the Research Paper in Tables 16 and 20.  If both the lay-version and technical reports had always shown how little test scores are improved by adding student surveys and classroom observations, it would be plain that test scores alone are just about as good as multiple measures.

The Gates folks never actually inaccurately describe their results (as Vicki Phillips did with the previous report).  But they are careful to frame the findings as consistently as possible with the Gates policy agenda of pushing a formal system of measuring teacher effectiveness that involves multiple measures.  And it worked, since the reporters are repeating this inaccurate spin of their findings.


(UPDATE — For a post anticipating responses from Gates, see here.)

Gates Foundation — Release the MET Results

October 25, 2011

A sketch of the $500 million new Gates Foundation headquarters

Bill and Melinda Gates mentioned again in the Wall Street Journal the Measuring Effective Teachers (MET) project that their foundation is orchestrating.  Bill and Melinda may want to check on the status of the MET research they’ve been touting since full results were promised in the spring of 2011 and have yet to be released.

Just to review… In an earlier interview with the Journal, MET was described as follows:

the Gates Foundation’s five-year, $335-million project examines whether aspects of effective teaching, classroom management, clear objectives, diagnosing and correcting common student errors can be systematically measured. The effort involves collecting and studying videos of more than 13,000 lessons taught by 3,000 elementary school teachers in seven urban school districts.

The motivation, re-iterated in the new piece by Bill and Melinda Gates is to identify  what “works” in classroom teaching to develop systems that train and encourage other teachers to imitate those practices:

It may surprise you—it was certainly surprising to us—but the field of education doesn’t know very much at all about effective teaching. We have all known terrific teachers. You watch them at work for 10 minutes and you can tell how thoroughly they’ve mastered the craft. But nobody has been able to identify what, precisely, makes them so outstanding….

The intermediate goal of MET is to discover what we are able to measure that is predictive of student success. The end goal is to have a better sense of what makes teaching work so that school districts can start to hire, train and promote based on meaningful standards.

As I’ve argued before, using research to identify “best practices” in teaching only makes sense if the same teaching approaches would be desirable for the vast majority of teachers and students, regardless of the context.  And as I’ve also  suggested before, I don’t believe this effort is likely to yield much in education.  Effective teaching is like effective parenting — it is highly dependent on the circumstances.  Yes, there are some parenting (and teaching) techniques that are generally effective for almost everyone, but those are mostly known and already in use.

This doesn’t mean we are completely unable to measure effective teaching (or parenting).  It just means that we have to judge it by the results and cannot easily make universal statements about the right methods for producing those results.  To make a sports analogy, there is no single “best practice” for hitters in baseball.  There are a variety of stances and swings.  The best way to judge an effective hitter is by the results, not by the stance or swing.  And if we tried to make all hitters stand and swing in the same way, we’d make a lot of them worse hitters.

It is because of this heterogeneity in effective teaching practices that I think the MET project is doomed to disappoint.  And according to inside sources, I’ve heard that results are being delayed because they are failing to produce much of anything.

According to the MET web site, the full results for the 1st year should have been released in the spring:

 In spring 2011, the project will release full results from the first year of the study, including predictors of teaching effectiveness and correlation with value-added assessments.

It is almost November and we have not seen these results.  I understand that in very large and complicated projects, like MET, things can take much longer than originally planned.  If so, it would be nice to hear that explanation.  It would be even nicer if the Gates Foundation released results if they have them, even if those results were not what they had hoped they would find.

Some inquisitive reporters should start asking Gates officials and members of the research team about the status of the MET results.  Reporters should go beyond talking to the media flacks at Gates HQ and actually talk to individual members of the team confidentially.  If they do that, they may confirm what I have been hearing: MET results have been delayed because they aren’t panning out.

(UPDATE:  Gates responds.

The Gates Foundation and the Rise of the Cool Kids

July 28, 2011

(Guest Post by Matthew Ladner)

Jay and Greg have been carrying on an important discussion concerning the Gates Foundation and education reform. I wanted to add a few thoughts.

Rick Hess and others have noted the “philanthropist as royalty” phenomenon in the past. Any philanthropist runs the danger of only hearing what they want to hear from their supplicants, and Gates as the largest private foundation runs the biggest risk. The criticism of the Gates Foundation I had seen in the past emanated from the K-12 reactionary fever swamp, hardly qualifying as constructive.

The challenge faced by philanthropists: how do you challenge your own assumptions and evaluate your own efforts honestly? Do you hire formidable Devil’s advocates to level their most skeptical case against your efforts?

I don’t know the answer to these questions, just that if I were Bill Gates I would be terrified of everyone telling me how right my thinking is because they want my money. This is however the best sort of problem to have…

Jay’s central critique of the Gates Foundation strategy seems to be that they have put too much faith in a centralized command and control strategy. They would be wise to entertain this thought. If command and control alone were the solution, then we wouldn’t have education problems-district, state and federal governance have all failed to prevent widespread academic failure for decades.

The Gates strategy does however embrace decentralization. Over the years they have supported charter schools, and fiercely opposed the worst one-size fits all policy of all: salary schedules and automatic/irrevocable tenure. Riley’s WSJ article makes clear that Gates understands the benefits of private school choice, but that he falls for the Jay Mathews fallacy of thinking it is just too politically difficult.

Sigh…perhaps next year Greg can make a dinner bet with Bill.

Gates is also the primary backer of Khan Academy. This new article on Sal Khan in Wired magazine makes clear that Khan understands the danger of being swallowed by school systems and that he is not going to allow it to happen. Khan academy is both radically decentralized and is in the early stages of being used by people within the centralized school system to improve outcomes.

Whatever the mistakes to date, the Gates Foundation has in my mind has succeeded in serving as a counter-weight to the NEA, mostly through funding the efforts of a myriad network of reform organizations collectively known as the Cool Kids. Today, there is a struggle for power going on within the Democratic Party over K-12 policy and the Gates Foundation deserves some credit in my mind for supporting  the ideas behind the “Democrat Spring” on education policy. This spring is following more of the Syrian than the Egyptian model thus far, but it is happening, and it is very important.

Does that mean that they are the “good guys” and Jay should lay off of them? Of course not-reasoned critiques of large philanthropists are in short supply for all of the factors cited above. Jason Riley wished that Gates were bolder in embracing decentralization reforms, but noted that in the end that it was the Gates rather than the Riley Foundation. This is absolutely true, but it doesn’t make the royalty problem go away, and leaves a continuous question of how the emperor gets feedback on his new clothes.

I don’t agree with the Cool Kids about everything. The next time I hear someone ask a question about having Common Core replace NAEP (the very pinnacle of naive folly) for instance I may pull out entire tufts of my graying, thinning hair in utter exasperation. Reformers of all stripes need to be on guard against the ship-wheel conceit, which is to imagine that if only my strong hands steered the ship, we’d sail through the rocky shoals of ed reform without a hitch.

The East Germans ran a much better economy than the North Koreans, much to the benefit of Germans and to the detriment of Koreans. This is real and important in human terms- I do not make this point glibly. I never heard about an East German famine decimating the population, but food shortages have even soldiers starving to death in North Korea (pity the women and children). Better quality management is good and desirable, but…it will only take you so far. Today, Chinese apparatchiks are noisily crediting themselves for the tremendous economic progress in China without the slightest hint of irony. Without the market forces Deng introduced and with more apparatchiks, China would revert back to a starving backwater. With fewer apparatchiks, her progress would almost certainly accelerate.

As Sara Mead correctly noted in this guest post at Eduwonk, today’s education debate largely involves a mixture of technocratic and market-based reforms (neo-liberals) on one side and a group of reactionaries lacking realistic solutions on the other. A third of our 4th graders can’t read and have been shoved into the dropout pipeline. We need both technocratic and market based reforms, and we need stronger reforms of both sorts than those fielded to date.

Jay’s critique concerns the right mix of reforms within the bounds of the neo-liberal consensus. This of course is a matter of debate, and debate is the path to deeper understanding. The sheer size of the Gates Foundation has the potential to stifle such debate as it relates to their efforts, even passively, and reformers should recognize the danger in allowing it to do so. This isn’t about them so much as it is about us.

Gates Foundation Follies (Part 2)

July 26, 2011

A sketch of the $500 million new Gates Foundation headquarters

In Part 1 of this post, I described how the Gates Foundation came to recognize the importance of using political influence to reform the education system rather than focusing on reforming one school at a time in the hopes that school systems would see and replicate successful models.  No private philanthropist has enough money to buy and sustain widespread adoption of an effective approach and the public school system has little incentive to identify and spread effective approaches on their own.

Faced with the unwillingness of the public school system to reproduce successful models (assuming that Gates could even offer one), the Foundation was left with two solutions to encourage innovation: 1) identify the best practices themselves and impose them from the top down, or 2) encourage choice and competition so that schools would have the proper incentive to identify, imitate, and properly implement effective approaches.

The Gates Foundation made the wrong choice.  Their top-down strategy cannot work for the following reasons:

1) Education does not lend itself to a single “best” approach, so the Gates effort to use science to discover best practices is unable to yield much productive fruit;

As I’ve explained before, there are many different “best” techniques for different kinds of teachers with different kinds of students in different situations with different available resources.  There are some practices that are universally beneficial in education, but they tend to be pretty obvious and are already well known (e.g. it is bad to beat kids, it is better when teachers know the material they are teaching, it is helpful to break down ideas into their essential components, etc…).

The difficulty of discovering universally beneficial  practices that are not already well-known, especially with the blunt tools available to researchers probably helps explain why the Measuring Effective Teachers (MET) project, on which the Gates Foundation is spending $335 million has yet to produce any meaningful results despite entering its third year of operation.

2) As a result, the Gates folks have mostly been falsely invoking science to advance practices and policies they prefer for which they have no scientific support;

Despite having nothing to show for the $335 million they are spending on MET, the Gates folks nevertheless claim that it “proves” the harmfulness of teachers engaging in “drill and kill.” The fact that the research showed no such thing did not deter them from telling the NY Times and LA Times that it did.  Even when I pointed out the error, the Gates folks refused to issue a correction (although the LA Times ran one on their own).

Similarly, the Gates-orchestrated effort to push national standards, curricular materials, and assessments is advancing without any scientific evidence of the desirability of these approaches.  Gathering a group of Checker Finn’s friends (er, I mean, “a panel of experts”) to attest that the Common Core standards are better is not science.  It is the false invocation of science to manipulate people into compliance with their agenda.

3) Attempting to impose particular practices on the nation’s education system is generating more political resistance than even the Gates Foundation can overcome, despite their focus on political influence and their devotion of significant resources to that effort;

Opponents of centralized control of education have begun to mobilize against the Gates-orchestrated effort to establish national standards, curricular materials, and assessments.  But the bulk of the political resistance to the Gates strategy will come from the teacher unions.  They don’t want anyone to infringe on their autonomy or place their interests in jeopardy with a nationalized accountability system.  They may play along with Gates for a while and take their money, but when push comes to shove the unions can only tolerate one dictator in education — the unions.  Of course, those of us who don’t want anyone centrally-controlling the nation’s education system will oppose both Gates and the teacher unions.

We already have a taste of the kind of resistance teacher unions will put up against the Gates nationalization effort in the slogans emanating from Diane Ravitch and Valerie Strauss’ Twitter feed, supported by their Army of Angry Teachers.  Falsely claiming that MET proved that drill and kill is harmful did not mollify these folks at all.

The teacher unions derive far more power and money from the status quo than Gates can ever offer them, unless of course Gates builds a nationalized system and cedes control to the unions, which is not part of the Gates plan.  Nothing in the Gates strategy weakens the unions and would force them to make significant concessions, so in the end the unions will either hijack the Gates strategy for their own benefit or block it.  Even Gates does not have the resources to beat the unions without first diminishing their power.

4) The scale of the political effort required by the Gates strategy of imposing “best” practices is forcing Gates to expand its staffing to levels where it is being paralyzed by its own administrative bloat; 

Over the last decade the Gates Foundation has roughly doubled its assets but increased its staffing by about 10-fold.  The Foundation is now huge, which is part of why it needs the Education Pentagon pictured above to house everyone.  The Foundation has gotten huge because it is trying to buy political influence as it buys people.  Gates has been snapping up or funding just about every advocacy group, researcher, or education journalist they can find.  Getting all of these people on board for a nationalized education system (or at least mute their dissent) involves paying an enormous number of people and organizations.

Gates can buy a lot of folks, but they can’t buy everyone and they can’t keep the folks they do pay in line for very long.  It’s like herding cats. (I should note that I’ve received Gates Funding in the past).

And the sheer size of their staff and funded allies along with the focus on controlling the political message is so overwhelming that it is significantly hindering their ability to do anything.  People inside the organization have told me that they are suffering from a bureaucratic gridlock with endless meetings, conference calls, and chains of approvals.  Notice that Gates is paying a ton of researchers and yet virtually no research is coming out.  Very curious.

5) The false invocation of science as a political tool to advance policies and practices not actually supported by scientific evidence is producing intellectual corruption among the staff and researchers associated with Gates, which will undermine their long-term credibility and influence.

As noted above, the need to advance a particular political message has led Gates to mischaracterize their own research (for example, claiming that MET proves that drill and kill is harmful when the research does not show that).  But the intellectual corruption extends much farther.  I had a highly respected and accomplished researcher employed by Gates tell me that Vicki Phillips’ mischaracterization of the MET results was not so far off because there isn’t a big difference between a low correlation and a negative one.  He also defended comparing the magnitude of a series of pair-wise correlations to determine the relative influence of different variables.  To hear someone who knows better twist the truth to avoid contradicting the education boss at Gates was just sad.

Unfortunately, too many advocates, researchers, and others are being similarly corrupted.  In most cases the Gates folks don’t have to exert any explicit pressure on people to keep them in line; they just anticipate what they think would serve the Gates strategy.  But I am aware of at least one case in which a researcher’s findings were at odds with the desired outcome and that person suffered for it.

I’ve heard another story from someone involved in the MET project that the delay in releasing any results from the analyses of classroom videos even as the project enters its third year is explained by their inability to find any meaningful results.  Perhaps another year of data will make something turn up that they can finally tout for their $335 million investment.  The fact that the initial MET report with basically no useful findings was released on a Friday just before Christmas suggests that the Gates folks are working hard to shape their message.

The national standards, curriculum, and testing campaign is rife with intellectual corruption.  For example, people are twisting themselves into knots to explain how the effort is purely voluntary on the part of states when it is manifestly not, given federal financial “incentives,” offers of selective exemptions to NCLB requirements for states that comply, and the threat of future mandates.  There is so much spin around Gates that it makes one dizzy.


Let me be clear, most of the folks affiliated with Gates are good and smart people.  The problem is that when your reform strategy requires a top-down approach, these good and smart people are put under a lot of stress to have a unified vision of the “best” that will be imposed from the top.  And whenever an organization starts sprinkling millions of dollars on researchers and advocacy groups unaccustomed to that kind of money, there are temptations that are hard for the most virtuous to resist.

But the good and smart people at Gates can stop the counter-productive strategy that the Foundation is pursuing.  The Foundation changed course once before and it can do it again.


UPDATE — For my suggestions of what the Gates Foundation could do instead, see this post.

Gates Foundation Follies (Part 1)

July 25, 2011

A sketch of the $500 million new Gates Foundation headquarters

Jason Riley’s interview with Bill Gates in the Wall Street Journal was not as great as Riley’s interview with me last week (shameless plug for my new mini-book), but it was still very illuminating.  In particular, the Gates interview confirmed two things about the Foundation’s education efforts: 1) they’ve realized that the focus of their efforts has to be on the political control of schools and 2) they are uninterested in using that political influence to advance market forces in education. Instead, the basic strategy of the Gates Foundation is to use science (or, more accurately, the appearance of science) to identify the “best” educational practices and then use political influence to create a system of national standards, curricular materials, and testing to impose those “best practices” on schools nationwide.

The Gates Foundation came to understand the necessity of political influence over schools with the failure of their previous small schools strategy.  Under that strategy they tried to achieve reform by paying school districts to break-up larger high schools into smaller ones.  The problem with that strategy is that even the Gates Foundation does not have nearly enough money to buy systemic reform one school at a time.

School districts currently spend over $600 billion per year and the Gates Foundation only has $34 billion in total assets.  With the practice of spending only about 5% of assets each year and given the large (and effective) efforts the Foundation makes in developing country health-care, Gates only spends a couple hundred million dollars on education reform each year. Given the small share of total education spending Gates could offer, most public districts refused to entertain the Gates strategy of smaller schools, others took the money but failed to implement it properly, and others reversef the reform once the Gates subsidies ended.

The way I described the situation in my chapter “Buckets into the Sea” in the 2005 book, With the Best of Intentions, edited by Rick Hess is:

Philanthropists simply don’t have enough resource to reshape the education system on their own; all their giving put together amounts to only a tiny fraction of total education spending, so their dollars alone can’t make a significant difference.  In order to make a real difference, philanthropists must support programs that redirect how future public education dollars are spent.

And in 2008 I repeated this claim, saying: “total private giving to public education is a tiny portion of total spending on schools.  All giving, from the bake sale to the Gates Foundation, makes up less than one-third of 1% of total spending.  It’s basically rounding error.”

I don’t know whether the Gates Foundation was influence by my writing or whether they arrived at the same conclusions independently, but they are now articulating those same conclusions, often with the same exact words:

“It’s worth remembering that $600 billion a year is spent by various government entities on education, and all the philanthropy that’s ever been spent on this space is not going to add up to $10 billion. So it’s truly a rounding error.”

This understanding of just how little influence seemingly large donations can have has led the foundation to rethink its focus in recent years. Instead of trying to buy systemic reform with school-level investments, a new goal is to leverage private money in a way that redirects how public education dollars are spent.

While the focus of the Gates Foundation on influencing education policy is sensible, the particular political approach they have chosen is doomed to fail and attempting it is likely to be counter-productive.  In Part 2 of this post I will explain how the new strategy Gates has decided to pursue is flawed.

To give you a taste of what is coming in Part 2, the arguments can be summarized as: 1) Education does not lend itself to a single “best” approach, so the Gates effort to use science to discover best practices is unable to yield much productive fruit; 2) As a result, the Gates folks have mostly been falsely invoking science to advance practices and policies they prefer for which they have no scientific support; 3) Attempting to impose particular practices on the nation’s education system is generating more political resistance than even the Gates Foundation can overcome, despite their focus on political influence and their devotion of significant resources to that effort; 4) The scale of the political effort required by the Gates strategy of imposing “best” practices is forcing Gates to expand its staffing to levels where it is being paralyzed by its own administrative bloat; and 5) The false invocation of science as a political tool to advance policies and practices not actually supported by scientific evidence is producing intellectual corruption among the staff and researchers associated with Gates, which will undermine their long-term credibility and influence.

Tune in for Part 2.


UPDATE — For my suggestions of what the Gates Foundation could do instead, see this post.

Common Core and the Underpants Gnomes

January 27, 2014

It’s amazing how some very smart people can commit billions of dollars and  untold human effort to something like Common Core without having thought the thing through.  How exactly did they think this was going to work?  Didn’t they have meetings?  Didn’t someone have to write a paper articulating the theory of change?  Didn’t any of them ever take political science classes or read a book on interest group behavior?

As I have repeatedly said would eventually happen, the teacher unions are turning against Common Core in New York and threatening to do the same in other states if high stakes tests aligned to those standards are put in place.  And the unions are more powerful, better organized, and even better-funded than the Gates Foundation and their mostly DC-based defenders of Common Core.  So Common Core will either have to drop the high-stakes tests meant to compel teachers and schools to implement the standards, or Common Core will become yet another set of empty words in a document, like most sets of standards before them.

Here is what I expected would happen and I believe is coming true:

As I have written and said on numerous occasions, Common Core is doomed regardless of what I or the folks at Fordham say or do.  Either Common Core will be “tight” in trying to compel teachers and schools through a system of aligned assessments and meaningful consequences to change their practice.  Or Common Core will be “loose” in that it will be a bunch of words in a document that merely provide advice to educators.

Either approach is doomed.  If Common Core tries being tight by coercing teachers and schools through aligned assessments and consequences, it will be greeted by a fierce organized rebellion from educators.  It’ll be Randi Weingarten, Diane Ravitch and their army of angry teachers who will drive a stake through the heart of Common Core, not me or any other current critic . If Common Core tries being loose, it will be like every previous standards-based reform – a bunch of empty words in a document that educators can promptly ignore while continuing to do whatever they were doing before.

This is the impossible paradox for Common Core.  To succeed it requires more centralized coercion than is possible (or desirable) under our current political system and more coercive than organized educators will allow.  And if it doesn’t try to coerce unwilling teachers and schools, it will produce little change.

How did the political strategists at Gates and their DC advocates think this doom would be avoided?  Did they imagine that teachers and schools were starving for a good set of standards and would just embrace them once they were issued from the DC Temple in which they were written?  Did they think teachers and their unions wouldn’t politically resist an effort to compel compliance to Common Core through high stakes tests?  Did they think they could sneak up on teachers and unions and implement the whole thing before anyone would object?

I suspect that their thinking was something like the Underpants Gnomes from South Park whose business plan for  profiting from stealing underpants from kids’ drawers during the night is lacking: “Phase 1 — Collect underpants  Phase 2 — ?  Phase 3 — Profit.”  The Gates/Fordham/College Board plan must have been: Phase 1 — Write standards  Phase 2 — Incentivize states with federal carrots and sticks to engage in the empty gesture of adopting standards Phase 3 — ?  Phase 4 — Learning improves.

Even now I’d love to hear someone try to articulate Common Core’s theory of change.  And it is not sufficient to say that this is just the “hard work” of persuading teachers and schools.  It is also hard work to jump to the moon — so hard that it is impossible.  And I don’t want to hear “Remember: Undoing the #CommonCore would require 46 separate, state-led actions…”  That’s true, but states have many worthless pieces of legislation that do little to change the world.  Thirteen states still have anti-sodomy laws despite the fact that the Supreme Court struck down that type of law .

I don’t think Gates, Fordham, or anyone else really developed a plausible theory of change for Common Core.  Instead, I think they just had the type of magical thinking too common among smart DC policy analysts that if only they had good enough intentions and “messaged” the issue just right, all problems would be overcome.  Tell that to the ObamaCare folks who thought that good intentions and artful “messaging” would somehow repeal the law of adverse selection in who would sign up for the risk pools.  Our technocratic minds cannot control the behavior of other people, just by thinking about it hard, wanting good things, and talking about it a lot.

Art Research Publications

November 24, 2013


In the past week there’s been a flurry of articles coming out featuring our art research.  Education Researcher has a new piece by Dan Bowen, Brian Kisida, and me on how field trips to an art museum affect students’ critical thinking.  This article is a more technical and focused follow-up to our piece in Education Next.

Psychology of Music has a new study by Lisa Margulis, Brian Kisida, and me on how information affects the student experience when seeing a live musical performance.  In particular, we experimentally gave students a program note with information about a show and others a note with information about the venue but nothing about the show.  We then measured student enjoyment and knowledge.

And most recently, the New York Times published today a piece by Brian, Dan, and me summarizing our study of field trips to an art museum.

Nothing seems to generate a buzz of discussion on Twitter, Facebook, and email quite like a New York Times article.  Let’s hope it all leads to more research and thinking about the importance of art in education.

And let’s hope it counter-acts the wrong-headed view most prominently articulated by Bill Gates that devoting resources to the arts represents mistaken priorities.  Terry Teachout has a devastating rebuttal to Gates in Thursday’s Wall Street Journal.  In part Teachout writes:

… it seems clear to me that Mr. Gates thinks it immoral for rich people to give money to museums instead of medical projects, presumably those that have received the official Bill Gates Seal of Moral Approval. To be sure, he deserves full credit for putting his own money where his mouth is: The Bill & Melinda Gates Foundation gives away some $4 billion a year, much of which is used to support health-related initiatives in developing countries, including a world-wide initiative to stamp out polio.

Good for him—but when it comes to art, he’s got it all wrong, and then some.

It almost embarrasses me to restate for Mr. Gates’s benefit what most civilized human beings already take to be self-evident, which is that art museums, like symphony orchestras and drama companies and dance troupes, make the world more beautiful, thereby making it a better place in which to live. Moreover, the voluntary contributions of rich people help to ensure the continued existence of these organizations, one of whose reasons for existing is to make it possible for people who aren’t rich to enjoy the miracle that is art. If it weren’t for museums, you wouldn’t get to see any of the paintings of Rembrandt and Monet and Jackson Pollock (and, yes, Francis Bacon). Instead they’d be hanging in homes whose owners might possibly deign to open their doors to the public once a year. Maybe.

It is, as they say, a free country, and rich people get to do whatever they want with their money. They can spend it on paintings or children’s hospitals or beach houses. But the surprising thing—or maybe not—is that so many of them believe in helping to make the world a better place for their fellow men.

Nor do I hear any groundswell of support among the rich for Mr. Gates’s rigidly utilitarian view of charity. Perhaps that’s because the desire to partake of beauty is so deeply rooted in the human soul. Flip through a book of quotations and you’ll see an abundance of testimony to its lasting importance throughout the whole of recorded history. I especially like what Somerset Maugham said in his novel “Cakes and Ale”: “Beauty is an ecstasy; it is as simple as hunger.” So it is, and sooner or later most of us will long for it as we do for food. What could be more honorable than for a rich person to help satisfy that hunger in the same way that he might underwrite the operation of a food bank?

Indeed, many philanthropic organizations see no need to choose. The Doris Duke Charitable Foundation, for example, supports the performing arts and medical research.

Think that over the next time you feel inclined to sputter with rage over the results of the latest big-ticket art auction. While you’re at it, remember that in the long run, the chances are very, very good that the paintings for which “some rich guy dropped millions” will end up in a museum, perhaps even one that, like New York’s Frick Collection or the Phillips Collection in Washington, was built by the rich guy in question. And think about this as well: Of course it’s admirable to help prevent blindness—but it’s also admirable to help ensure that we have beautiful things to see.

A Chance for a New Fordham

November 4, 2013

Fordham’s Kathleen Porter Magee has responded to my post last week in which I argued that Fordham’s vision of Common Core as “tight-loose” is looking a lot more like “tight-tight.”  In her rejoinder, Kathleen Porter-Magee reiterates the distinction between standards and curriculum and insists that “good standards aren’t prescriptive, but they’re not agnostic, either.”

But just a week earlier in the foreword to Fordham’s new study judging the extent to which English teachers are changing instruction to meet Common Core, Kathleen and Checker talk about the “instructional shifts” Common Core standards “expect” and “demand.”  Now we are asked to believe that there is a world of difference between “prescribing” and ‘expecting” or “demanding.”

If this is beginning to sound like debating what the meaning of the word “is” is, there is a reason.  Almost everything coming out of Fordham (and a great many other DC think-tanks and advocacy groups) feels more like political campaign rhetoric than serious intellectual inquiry.  Rick Hess described Kathleen Porter-Magee’s rejoinder, saying it “read to me like a pol’s answer.”  Precisely.  It is a politician’s answer because the folks at Fordham (and many other DC policy shops) too often behave, talk, and write more like politicians than scholars or serious policy analysts.

My goal in critiquing Fordham (and the Gates Foundation) is to encourage them to behave less like politicians and more like scholars and serious policy analysts.  Kathleen Porter-Magee misunderstands my motivation, suggesting that I am trying to “undermine the credibility of [my] opponents” on Common Core so I “can win the day—facts be damned.”

But the truth is that I am under no delusion that what I write or say will have any effect on the fate of Common Core, nor do I really care about having such an effect.  As I have written and said on numerous occasions, Common Core is doomed regardless of what I or the folks at Fordham say or do.  Either Common Core will be “tight” in trying to compel teachers and schools through a system of aligned assessments and meaningful consequences to change their practice.  Or Common Core will be “loose” in that it will be a bunch of words in a document that merely provide advice to educators.

Either approach is doomed.  If Common Core tries being tight by coercing teachers and schools through aligned assessments and consequences, it will be greeted by a fierce organized rebellion from educators.  It’ll be Randi Weingarten, Diane Ravitch and their army of angry teachers who will drive a stake through the heart of Common Core, not me or any other current critic . If Common Core tries being loose, it will be like every previous standards-based reform – a bunch of empty words in a document that educators can promptly ignore while continuing to do whatever they were doing before.

This is the impossible paradox for Common Core.  To succeed it requires more centralized coercion than is possible (or desirable) under our current political system and more coercive than organized educators will allow.  And if it doesn’t try to coerce unwilling teachers and schools, it will produce little change.

If Common Core is doomed, why do I bother responding to Fordham, Gates, and others making arguments in its favor?  I am responding to the intellectual corruption that the political campaign for Common Core is producing among otherwise decent, smart, and well-intentioned folks.  Arguments like “tight-loose” are political campaign slogans, not intellectually serious ideas.  I’m trying to point this out, not “win the day” on the merits of Common Core.  I pick on Fordham because I am actually in substantive agreement with a good deal of what they are trying to accomplish and don’t want to see them pursue those goals with crappy political slogans.

But with Mike Petrilli assuming the presidency of the Fordham Institute next year, I see hope for a new Fordham.  He might start by hiring more social scientists and fewer former journalists and office-holders.  Policy analysis isn’t entirely about “messaging” to convince people to do what we already know is right.  There is a lot we don’t know and competing social science claims we need to adjudicate, so a good policy organization needs a bunch of people with content and research method expertise.  You can’t just rent this expertise on the cheap; you need to hire social scientists to make this expertise a stronger part of the organization’s DNA.  Look at Brookings, EPI, and AEI for models across the political spectrum that give priority to social science.

Mike might also consider diversifying support away from the Gates Foundation.  With more than $6 million from Gates in the last few years and with the appointment of former Gates political strategist, Stefanie Sanford, to the Fordham board, Fordham is beginning to feel like a wholly-owned subsidiary of the Gates Foundation.  I don’t think Fordham is advocating for anything they don’t believe because of Gates support, but I do think Gates is a corrupting influence that tries to make everything part of a political campaign rather than serious, honest inquiry.  Reducing reliance on Gates might free Fordham up to sound less like a string of political slogans.

To accomplish less reliance on Gates, Fordham might need to shrink a bit in size.  That would probably be a good thing.  A policy shop shouldn’t try to maximize its budget or head-count.  It should try to be the right size to do the work it wants to do.  Not chasing every dollar to become ever-larger would also free up Fordham to speak only when it wants to and not feel obliged to produce reports, tweets, and blog posts all of the frickin’ time.  A lower volume of communication might produce higher quality communication and probably increased influence.

Lastly, a shift away from the political obsession of journalists and former office-holders and toward a more serious, social scientific approach would help Fordham avoid crappy research and slogans.  Fordham should avoid doing any expert panel studies giving grades to this or that.  It should avoid doing selection on dependent variable analyses exploring why Massachusetts, Finland or anyone else is doing well.  It should avoid repeating the Fordham drinking game in which arguments depend on appending “smart” to regulation, curricum, etc… or dividing policies into three kinds where the middle one is the sensible alternative to two extremes.  Messaging is not really an argument.

One thing Fordham should not change is its principles and its sincere commitment to Common Core.  Contrary to Kathleen Porter-Magee’s assumptions, I am not trying to convince Fordham to change its position on Common Core. I just want Fordham not to confuse political campaigns for policy analysis.  Whatever happens with Common Core (and who knows, perhaps Fordham is right in thinking it is a great idea and will somehow help), we cannot degrade the currency of policy analysis by turning everything into an advocacy campaign.  Education reform is likely to be a very long game, so we don’t want to bend all rules, twist all facts, and pull out all stops just to win this one battle.  It would be nice to have a credible and effective Fordham around for the next ed reform debate.  I hope Mike Petrilli can help do this.

More Research Showing Small Schools Work, Gates Remains Silent

October 23, 2013

With the support of the Gates Foundation, New York City created 150 small schools of choice between 2002 and 2008.  Five previous rigorous studies of this program and other small school initiatives have demonstrated significant benefits for students.  Now we have a sixth study from the School Effectiveness and Inequality Initiative at MIT.

The authors, Atila Abdulkadiroglu, Weiwei Hu, and Parag Pathak, are economists at Duke and MIT.  They take advantage of lotteries to gain admission to these non-selective small schools of choice to conduct a random assignment experiment. The full study can be read here, but it does not allow me to cut and paste text  to summarize the results. According to the press release:

The study follows cohorts of rising 9th graders for five application years from 2003-04 through 2007-08. For these students, small schools boost performance across all five major Regents exams: Math, English, Living Environment, Global History, and US History.  Students randomly offered a seat at a small school accumulate 1.4 more credits per year, attend school for 4 more days each year, and are 9% more likely to receive a high school diploma. 
As the cohorts have aged, it is now possible to measure the effects of small schools on college enrollment and choice, outcomes that have never been examined before.   Compared to the college enrollment rate of 37% for those not offered, students at small schools are 7% more likely to attend college and 6% more likely to attend a four-year college.  Most of these gains come at four-year public institutions.  There is a marked 7% increase in the fraction of students who enroll in the CUNY system. Small schools cause students to clear CUNY remediation requirements in writing or reading.  The early evidence suggests that students are more likely to persist in college, as measured by attempting at least two academic semesters.  Students in the lottery study are too young to say anything definitive about college graduation. 
A major innovation in the study is its use of information contained in NYC’s Learning Environment Surveys to characterize the small school environment for those in the experiment.  Small schools are rated higher than fallback schools by student survey respondents on the overwhelming majority of questions on engagement, safety and respect, academic expectations, and communication.  Surveys indicate that students feel safer and have closer interactions with their peers and teachers, despite reporting a smaller variety of course offerings and activities.  Teachers indicate greater feedback, increased safety, and improved collaboration.
This research was supported by the National Science Foundation. The research team includes Atila Abdulkadiroglu, Professor of Economics, Weiwei Hu, PhD Candidate at Duke University and Parag Pathak, Associate Professor of Economics at MIT and SEII Director.  The study uses data provided by the New York City Department of Education.  The findings are being released in the National Bureau of Economic Research working paper series this week. 
The study uses an innovative research design based on admissions lotteries contained in the high school match.  The lottery-based research design relies on apples-to-apples comparisons: among those who apply to a given set of small schools, applicants who were randomly offered are compared with otherwise similar students who were not offered a seat.  The study covers more than 108 oversubscribed high school programs with 9th grade entry, which represent 70% of unselective small high schools opened between 2002-2008.
“These results indicate important possibilities for urban small schools reform,” said Pathak.  “The collaboration partnership between key stakeholders in New York City shows that within-district reform strategies can substantially improve student achievement.”
Despite more proof that the small schools of choice reform strategy pursued by the Gates Foundation before 2006 has been a clear success, the Gates Foundation has nothing to say about these positive results.  I can find nothing from their massive press machine touting the results — nothing on their web site, nothing on their twitter feed, no well-placed stories in the NY Times or LA Times.  Those efforts are reserved for their new, unproven and misguided strategy of top-down reform through Common Core and measuring and incentivizing teacher performance.
Let’s hope that the Gates Foundation and its followers are not impervious to evidence and reconsider their abandonment of the small schools of choice reform strategy.


Get every new post delivered to your Inbox.

Join 2,366 other followers