Head Start Revealed

January 14, 2013

Despite the obvious effort to delay and conceal the disappointing results from the official and high quality evaluation of Head Start, the Wall Street Journal shines the light on the issue in today’s editorial.  DC’s manipulating scumbags might want to take note that efforts to hide negative research might just draw more attention.  It’s comforting to see that the world may sometimes look more like Dostoevsky’s Crime and Punishment than Woody Allen’s Crimes and Misdemeanors.

The Journal reveals that Head Start supporters have not only ignored the latest study, but they are trying to sneak an extra $100 million for Head Start into the relief package for victims of Hurricane Sandy.  They also note that the most recent disappointing Head Start result is just the latest in a string of studies failing to find benefits from the program despite a cumulative expenditure of more than $180 billion.

And then the Journal finishes with this:

The Department of Health and Human Services released the results of the most recent Head Start evaluation on the Friday before Christmas. Once again, the research showed that cognitive gains didn’t last. By third grade, you can’t tell Head Start alumni from their non-Head Start peers.

President Obama has said that education policy should be driven not by ideology but by “what works,” though we have to wonder given his Administration’s history of slow-walking the release of information that doesn’t align with its agenda.

In 2009, the Administration sat on a positive performance review of the Washington, D.C., school voucher program, which it opposes. The Congressionally mandated Head Start evaluation put out last month was more than a year late, is dated October 2012 and was released only after Republican Senator Tom Coburn and Congressman John Kline sent a letter to HHS Secretary Kathleen Sebelius requesting its release along with an explanation for the delay. Now we know what was taking so long.

Like so many programs directed at the poor, Head Start is well-intentioned, and that’s enough for self-congratulatory progressives to keep throwing money at it despite the outcomes. But misleading low-income parents about the efficacy of a program is cruel and wastes taxpayer dollars at a time when the country is running trillion-dollar deficits.

A government that cared about results would change or end Head Start, but instead Congress will use the political cover of disaster relief to throw more good money after proven bad policy.

[UPDATE: And here is a good follow-up op-ed on the study by Lindsey Burke on the Fox News web site.]

What Success Would Have Looked Like

January 10, 2013

Yesterday I described the Gates Foundation’s Measuring Effective Teachers (MET) project as “an expensive flop.”  To grasp just what a flop the project was, it’s important to consider what success would have looked like.  If the project had produced what Gates was hoping, it would have found that classroom observations were strong, independent predictors of other measures of effective teaching, like student test score gains.  Even better, they were hoping that the combination of classroom observations, student surveys, and previous test score gains would be a much better predictor of future test score gains (or of future classroom observations) than any one of those measures alone.  Unfortunately, MET failed to find anything like this.

If MET had found classroom observations to be strong predictors of other indicators of effective teaching and if the combination of measures were a significantly better predictor than any one measure alone, then Gates could have offered evidence for the merits of a particular mixing formula or range of mixing formulas for evaluating teachers.  That evidence could have been used to good effect to shape teacher evaluation systems in Chicago, LA, and everywhere else.

They also could have genuinely reassured teachers anxious about the use of test score gains in teacher evaluations.  MET could have allayed those concerns by telling teachers that test score gains produce information that is generally similar to what is learned from well-conducted classroom observations, so there is no reason to oppose one and support the other.  What’s more, significantly improved predictive power from a mixture of classroom observations with test score gains could have made the case for why we need both.

MET was also supposed to have helped us adjudicate among several commonly used rubrics for classroom observations so that we would have solid evidence for preferring one approach over another.  Because MET found that classroom observations in general are barely related to other indicators of teacher effectiveness, the study told us almost nothing about the criteria we should use in classroom observations.

In addition, the classroom observation study was supposed to help us identify the essential components of effective teaching .  That knowledge could have informed improved teacher training and professional development.  But because MET was a flop (because classroom observations barely correlate with other indicators of teacher effectiveness and fail to improve the predictive power of a combined measure), we haven’t learned much of anything about the practices that are associated with effective teaching.  If we can’t connect classroom observations with effective teaching in general, we certainly can’t say much about the particular aspects of teaching that were observed that most contributed to effective teaching.

Just so you know that I’m not falsely attributing to MET these goals that failed to be realized, look at this interview from 2011 of Bill Gates by Jason Riley in the Wall Street Journal.  You’ll clearly see that Bill Gates was hoping that MET would do what I described above.  It failed to do so.  Here is what the interview revealed about the goals of MET:

Of late, the foundation has been working on a personnel system that can reliably measure teacher effectiveness. Teachers have long been shown to influence students’ education more than any other school factor, including class size and per-pupil spending. So the objective is to determine scientifically what a good instructor does.

“We all know that there are these exemplars who can take the toughest students, and they’ll teach them two-and-a-half years of math in a single year,” he says. “Well, I’m enough of a scientist to want to say, ‘What is it about a great teacher? Is it their ability to calm down the classroom or to make the subject interesting? Do they give good problems and understand confusion? Are they good with kids who are behind? Are they good with kids who are ahead?’

“I watched the movies. I saw ‘To Sir, With Love,'” he chuckles, recounting the 1967 classic in which Sidney Poitier plays an idealistic teacher who wins over students at a roughhouse London school. “But they didn’t really explain what he was doing right. I can’t create a personnel system where I say, ‘Go watch this movie and be like him.'”

Instead, the Gates Foundation’s five-year, $335-million project examines whether aspects of effective teaching—classroom management, clear objectives, diagnosing and correcting common student errors—can be systematically measured. The effort involves collecting and studying videos of more than 13,000 lessons taught by 3,000 elementary school teachers in seven urban school districts.

“We’re taking these tapes and we’re looking at how quickly a class gets focused on the subject, how engaged the kids are, who’s wiggling their feet, who’s looking away,” says Mr. Gates. The researchers are also asking students what works in the classroom and trying to determine the usefulness of their feedback.

Mr. Gates hopes that the project earns buy-in from teachers, which he describes as key to long-term reform. “Our dream is that in the sample districts, a high percentage of the teachers determine that this made them better at their jobs.” He’s aware, though, that he’ll have a tough sell with teachers unions, which give lip service to more-stringent teacher evaluations but prefer existing pay and promotion schemes based on seniority—even though they often end up matching the least experienced teachers with the most challenging students.

The final MET reports produced virtually nothing that addressed these stated goals.  But in Orwellian fashion, the Gates folks have declared the project to be a great success.  I never expected MET to work because I suspect that effective teaching is too heterogeneous to be captured well by a single formula.  There is no recipe for effective teaching because kids and their needs are too varied, teachers and their abilities are too varied, and the proper matching of student needs and teacher abilities can be accomplished in many different ways.  But this is just my suspicion.  I can’t blame the Gates Foundation for trying to discover the secret sauce of effective teaching, but I can blame them for refusing to admit that they failed to find it.  Even worse, I blame them for distorting, exaggerating, and spinning what they did find.

(edited for typos)

Understanding the Gates Foundation’s Measuring Effective Teachers Project

January 9, 2013

If I were running a school I’d probably want to evaluate teachers using a mixture of student test score gains, classroom observations, and feedback from parents, students, and other staff.  But I recognize that different schools have different missions and styles that can best be assessed using different methods.  I wouldn’t want to impose on all schools in a state or the nation a single, mechanistic system for evaluating teachers since that is likely to be a one size fits none solution.  There is no single best way to evaluate teachers, just like there is no single best way to educate students.

But the folks at the Gates Foundation, afflicted with PLDD, don’t see things this way.  They’ve been working with politicians in Illinois, Los Angeles, and elsewhere to centrally impose teacher evaluation systems, but they’ve encountered stiff resistance.  In particular, they’ve noticed that teachers and others have expressed strong reservations about any evaluation system that relies too heavily on student test scores.

So the folks at Gates have been trying to scientifically validate a teacher evaluation system that involves a mix of test score gains, classroom observations, and student surveys so that they can overcome resistance to centrally imposed, mechanistic evaluation systems.  If they can reduce reliance on test scores in that system while still carrying the endorsement of “science,” the Gates folk imagine  that politicians, educators, and others will all embrace the Gates central planning fantasy.

Let’s leave aside for the moment the political reality, demonstrated recently in Chicago and Los Angeles, that teachers are likely to fiercely resist any centrally imposed, mechanistic evaluation system regardless of the extent to which it relies on test scores.  The Gates folks want to put on their lab coats and throw the authority of science behind a particular approach to teacher evaluation.  If you oppose it you might as well deny global warming.  Science has spoken.

So it is no accident that the release of the third and final round of reports from the Gates Foundation’s Measuring Effective Teachers project was greeted with the following headline in the Washington Post: “Gates Foundation study: We’ve figured out what makes a good teacher,”  or this similarly humble claim in the Denver Post: “Denver schools, Gates foundation identify what makes effective teacher.”  This is the reaction that the Gates Foundation was going for — we’ve used science to discover the correct formula for evaluating teachers.  And by implication, we now know how to train and improve teachers by using the scientifically validated methods of teaching.

The only problem is that things didn’t work out as the Gates folks had planned.  Classroom observations make virtually no independent contribution to the predictive power of a teacher evaluation system.  You have to dig to find this, but it’s right there in Table 1 on page 10 of one of the technical reports released yesterday.  In a regression to predict student test score gains using out of sample test score gains for the same teacher, student survey results, and classroom observations, there is virtually no relationship between test score gains and either classroom observations or student survey results.  In only 3 of the 8 models presented is there any statistically significant relationship between either classroom observations or student surveys and test score gains (I’m excluding the 2 instances were they report p < .1 as statistically significant).  And in all 8 models the point estimates suggest that a standard deviation improvement in classroom observation or student survey results is associated with less than a .1 standard deviation increase in test score gains.

Not surprisingly, a composite teacher evaluation measure that mixes classroom observations and student survey results with test score gains is generally no better and sometimes much worse at predicting out of sample test score gains.  The Gates folks trumpet the finding that the combined measures are more “reliable” but that only means that they are less variable, not any more predictive.

But “the best mix” according to the “policy and practitioner brief” is “a composite with weights between 33 percent and 50 percent assigned to state test scores.”  How do they know this is the “best mix?”  It generally isn’t any better at predicting test score gains.  And to collect the classroom observations involves an enormous expense and hassle.  To get the measure as “reliable” as they did without sacrificing too much predictive power, the Gates team had to observe each teacher at least four different times by at least two different coders, including one coder outside of the school.  To observe 3.2 million public school teachers for four hours by staff compensated at $40 per hour would cost more than $500 million each year.  The Gates people also had to train the observers at least 17 hours and even after that had to throw out almost a quarter of those observers as unreliable.  To do all of this might cost about $1 billion each year.

And what would we get for this billion?  Well, we might get more consistent teacher evaluation scores, but we’d get basically no improvement in the identification of effective teachers.  And that’s the “best mix?”  Best for what?  It’s best for the political packaging of a centrally imposed, mechanistic teacher evaluation system, which is what this is all really about.  Vicki Phillips, who heads the Gates education efforts, captured in this comment what I think they are really going for with a composite evaluation score:

Combining all three measures into a properly weighted index, however, produced a result “teachers can trust,” said Vicki Phillips, a director in the education program at the Gates Foundation.

It’ll cost a fortune, it doesn’t improve the identification of effective teachers, but we need to do it to overcome resistance from teachers and others.  Not only will this not work, but in spinning the research as they have, the Gates Foundation is clearly distorting the straightforward interpretation of their findings: a mechanistic system of classroom observation provides virtually nothing for its enormous cost and hassle.  Oh, and this is the case when no stakes were attached to the classroom observations.  Once we attach all of this to pay or continued employment, their classroom observation system will only get worse.

I should add that if classroom observations aren’t useful as predictors, they also can’t be used effectively for diagnostic purposes.  An earlier promise of this project is that they would figure out which teacher evaluation rubrics were best and which sub-components of those rubrics that were most predictive of effective teaching.  But that clearly hasn’t panned out.  In the new reports I can’t find anything about the diagnostic potential of classroom observations, which is not surprising since those observations are not predictive.

So, rather than having “figured out what makes a good teacher” the Gates Foundation has learned very little in this project about effective teaching practices.  The project was an expensive flop.  Let’s not compound the error by adopting this expensive flop as the basis for centrally imposed, mechanistic teacher evaluation systems nationwide.

(Edited for typos and to add links.  To see a follow-up post, click here.)

Head Start Manipulating Scumbags

December 20, 2012

I’ve heard that the latest round of results from the federal evaluation of Head Start is due to be released tomorrow afternoon.  And my psychic powers tell me that the results will show no lasting benefit from Head Start, just like the two previous rounds of results.

You heard that right — the federal government is releasing results that the administration dislikes on a Friday afternoon just before Christmas.  They might as well put the results on display in a locked filing cabinet in a disused lavatory behind the sign that says “beware of the leopard.”

Why is the Department of Health and Human Services burying this study just like they delayed, buried, or distorted the previous ones?  Well, because the study is an extremely rigorous and comprehensive evaluation, involving random assignment of a representative sample of all Head Start students nationwide, that I expect will find no enduring benefits from this program that politicians, pundits, and other dimwits constantly want to expand and fund.  Anyone who casts doubt on think tank research should cast a critical eye toward gross manipulations and abuse of research that are perpetrated by the federal government.

I should repeat that the researchers have done an excellent job evaluating Head Start in this case.  It is the bureaucratic class at the Department of Health and Human Services who have cynically manipulated, delayed, and misreported this research.  The pending report is already delayed several years and has been around for a long time.  The decision to release it on the Friday afternoon before Christmas is completely calculated.

I don’t know your names, but I’m going to invest a little energy in tracking down who is responsible for this cynical abuse of research.  If there were any reporters worth their salt left out there, they would bother to expose you but I guess that job has now been passed to bloggers and enterprising individuals.  When I do find your names I will post them so folks can know who the scumbags are who think they can manipulate the policy community by delaying, burying, or misreporting research.  And then when you get hired by that DC think tank, advocacy organization, or other waste of space we’ll be able to remember who you are and assign no credibility to what you have to say.  These kinds of dastardly acts by public servants should not be cost free and if I have any say in the matter they will not be in this case.

Florida Crushes the Ball on Progress in International Literacy Study

December 11, 2012

(Guest Post by Matthew Ladner)

TIMS released 2011 results today in a variety of subjects. This time a handful of states were brave enough to volunteer for a pullout of their results. Here are the results on 4th grade reading:


Here are the pullouts:


You got it: Florida students notched the second highest score in the world. Even above (gasp!) Finland.

Late for a meeting. More later, but for now:


And You Thought Administrative Bloat in Higher Ed Was Bad…

October 24, 2012

When Brian Kisida, Jonathan Mills, and I released our study of administrative bloat in higher education through the Goldwater Institute, we thought it was bad that universities had increased their hiring of administrators (professional staff who are not faculty) at twice the rate of faculty.

I now realize that the perpetrators of waste in higher ed are mere amateurs.  The administrative bloat pros can be found in K-12 education.  According to a new report from the Friedman Foundation released today, student enrollment has increased 96% since 1950, but the growth in “administrators and other non-teaching staff [was] a staggering 702 percent.”

The report provides results state by state, highlighting the growth in staffing in recent years.  Even in the few states where enrollment has declined, staffing levels have grown dramatically.  Check it out.

Charters v. Private Schools: Urban and Suburban Differences

August 28, 2012

(Guest post by Greg Forster)

Cato has new research out from Richard Buddin, examining where charter schools draw their students from. Adam Schaeffer offers a summary, emphasizing the dangers of charter schools: “On average, charter schools may marginally improve the public education system, but in the process they are wreaking havoc on private education.”

I agree with the basic premise: charters don’t fix the underlying injustice of government monopolizing education by providing “free” (i.e. free at the point of service, paid for by taxpayers) education, driving everyone else out of the education sector. As Jay and I have argued before, vouchers make the world safe for charters; that implies you can view charters as a response by the government to protect its monopoly against the disruptive threat of voucher legislation.

But what interests me more are the urban/suburban and elementary/secondary breakdowns of these data. It appears that charters are only substantially cutting into private schools in “highly urban” areas. In the suburbs, the charter school option is framed much more in terms of boutique specialty alternatives (schools for the arts, classical education, etc.) rather than “your school sucks, here’s one that works.” If you’d asked me, I would have guessed that would also cut heavily into the private school market – it would appeal to parents of high means who are looking for something out of the ordinary for their children, and that demographic would be most likely to already be in private schools. Yet the data show otherwise; apparently the families choosing boutique suburban charters weren’t much impressed with their private school options. And what’s up with this weird distribution on the elementary/secondary axis? Apparently public middle schools really stink in urban/suburban border areas.


Blinding Us with Science

August 15, 2012

(Guest post by Greg Forster)

Jay’s proposed reforms to the way Gates handles science are relevant far beyond the Gates Foundation, and foundations generally. He’s helping us think about how to wrestle with a deeper problem.

Public policy arguments need an authority to which they can appeal. The percentage of the population that is both willing and able to absorb all the necessary information to make a responsible decision without relying on pretty sweeping appeals to authority is very small. And even for us wonks, you can’t reduce the role of authority to zero; life doesn’t work that way. (Economists call this “the information problem.”)

So it’s normal, natural and right for public policy arguments to make some appeals to authority. The problem is that increasingly, our culture has no widely recognized authorities other than science. When there are many potential loci of authority, there is less pressure to corrupt them. If the science doesn’t back your view, you can appeal to other sources of authority. Where there is only one authoritative platform, there’s no alternative but to seize it.

As I once wrote:

Say that you favor a given approach – in education, in politics, in culture – because it is best suited to the nature of the human person, or because it best embodies the principles and historic self-understanding of the American people, and you will struggle even to get a hearing. But if you say that “the science” supports your view, the world will fall at your feet.

Of course, this means powerful interest groups rush in to seize hold of “science,” to trumpet whatever suits their preferences, downplay its limitations, and delegitimize any contrary evidence. If they succeed – which they don’t always, but they do often enough – “the science” quickly ceasees to be science at all. That’s why “scientific” tyrannies like the Soviet Union had to put so many real scientists in jail – or in the ground.

We need other sources of wisdom and knowledge – and hence of authority, because those who are recognized as having wisdom and knowledge will be treated as sources of authority – besides science. As Jay has written:

Science has its limits.  Science cannot adjudicate among the competing values that might attract us to one educational approach over another.  Science usually tells us about outcomes for the typical or average student and cannot easily tell us about what is most effective for individual students with diverse needs.  Science is slow and uncertain, while policy and practice decisions have to be made right now whether a consensus of scientific evidence exists or not.  We should rely on science when we can but we also need to be humble about what science can and can’t address…

My fear is that the researchers, their foundation-backers, and most-importantly, the policymaker and educator consumers of the research are insensitive to these limitations of science.  I fear that the project will identify the “right” way to teach and then it will be used to enforce that right way on everyone, even though it is highly likely that there are different “right” ways for different kids…

Science can be corrupted so that it simply becomes a shield disguising the policy preferences of those in authority.  How many times have you heard a school official justify a particular policy by saying that it is supported by research when in fact no such research exists?  This (mis)use of science is a way for authority figures to tell their critics, “shut up!”

To summarize the whole point, our group of school choice researchers put it well (false humility aside) in our Education Week op-ed earlier this year:

Finally, we fear that political pressure is leading people on both sides of the issue to demand things from “science” that science is not, by its nature, able to provide. The temptation of technocracy—the idea that scientists can provide authoritative answers to public questions—is dangerous to democracy and science itself. Public debates should be based on norms, logic, and evidence drawn from beyond just the scientific sphere.

What can we do about it? Beyond building in checks and balances to ensure that science isn’t being abused, we can make a deliberate effort to appeal to non-scientific sources of wisdom. There’s nothing unscientific about relying on “norms, logic, and evidence drawn from beyond just the scientific sphere.” In Pride and Prejudice, Caroline Bingley comments that it would be more rational if there were more conversation and less dancing at balls; her brother comments that this would indeed be “much more rational, I dare say, but much less like a ball.” It might be more scientific if our civic discourse appeals to nothing but science, but it’s much less like civic discourse.

For a good example of what I mean, check out Freedom and School Choice in American Education. When it came out, I commented on how it showed the diverse values that had led the authors to support school choice:

What’s particularly valuable about this book, I think, is how it gives expression to the very different paths by which people come to hold educational freedom as an aspiration, and then connects those aspirational paths to the practical issues that face the movement in the short term. Jay comes to educational freedom with an emphasis on accountability and control; against the Amy Gutmanns of the world who want to set up educational professionals as authority figures to whom parents must defer, Jay wants to put parents back in charge of education. Matt comes to educational freedom with an emphasis on alleviating unjustified inequalities; against the aristocrats and social Darwinists of the world who aren’t bothered by the existence of unjustified inequalities, Matt wants social systems to maximize the growth of opportunities for those least likely to have access to them. And I come to educational freedom with an emphasis on the historical process of expanding human capacities, especially as embodied in America’s entrepreneurial culture; agaisnt all forms of complacency, I want America to continue leading the world in inventing ever better ways of flourishing the full capacities of humanity. And each of the other contributors has his or her own aspirational path.

Individual liberty; the lifting up of the poor and the marginalized; the American experiment in enterprise culture. These are fine things worth fighting for, and they would remain so no matter what the science says.

President Bush Discusses Global Report Card

July 19, 2012

Last fall Josh McGee and I developed the Global Report Card (GRC) for the George W. Bush Institute. The GRC is a tool that allows people to compare the level of academic achievement in virtually every school district in the United States to the average for their state, the country, and a comparison group of 25 industrialized countries.

Above is a new interview with President Bush in which he discusses the Global Report Card (it’s around minute 25).

The Global Report Card received a fair amount of coverage when it was released, but keep your eyes out for an updated and improved version this upcoming fall.  The results of the GRC are consistent with other international comparisons, including a series of pieces by Eric Hanushek, Paul Peterson, and Ludger Woessmann (the most recent of which can be found here).  But the GRC goes a step further by allowing comparisons to be made at the school district  level.  GRC 2.0 will also have some new features and comparisons that people might find useful.

School Choice and the Greenfield School Revolution

June 5, 2012

(Guest post by Greg Forster)

Today, the Friedman Foundation is releasing a study I did with James Woodworth: The Greenfield School Revolution and School Choice. We know from previous research that vouchers (and equivalent programs like tax credits and ESAs) consistently deliver better academic performance, but the size of the impact is not revolutionary. Meanwhile, the whole world is watching as charter school operators (Carpe Diem, Rocketship, Yes Prep, etc.) reinvent the school from the ground up.

It’s ironic that these schools are charters, not voucher schools. A properly designed (i.e. universal) choice program would do a better job than charters of supporting these highly ambitious “greenfield” school models. But existing choice programs are not properly designed, so our impression was that they’re excluding these educational entrepreneurs, instead simply transferring students from one existing set of schools (public) to another (private).

We wanted to test our theory and make sure it was true, not just an accident of publicity or media bias, that the reinvention of the school wasn’t being supported by existing choice programs. We combed through twenty years’ worth of federal data (CCD and PSS) to see if we could find any evidence of disruption in the structure of the private school sector in places that had school choice programs.

We found that while existing school choice programs may be delivering moderately better academic outcomes, they aren’t disrupting the private school sector the way they need to be. In one or two places we found visible impacts, but nothing like a reinvention of schooling. The only impact of any considerable size is the dramatic change in racial composition in the private school population of Milwaukee.

In addition to the empirical findings, the study outlines 1) why radical “greenfield” school models are essential to drive the kind of education reform we need, and 2) why universal school choice would do a better job than charter schools of sustaining it.

Special thanks to Rick Hess, from whom we borrow the term “greenfield,” and Jay Greene for giving us their comments and insights as we developed this study!


Get every new post delivered to your Inbox.

Join 2,473 other followers