Ed Reform Political Judgment Often Wrong

August 21, 2018

The Ed Reform Establishment tends to favor more highly regulated and targeted school choice programs.  When challenged on the merits of those preferences, they sometimes acknowledge that regulating and targeting choice may not produce better outcomes but they assert that such approaches have political advantages over less regulated and more universal programs.

The string of political failures, from Question 2 in Massachusetts to the inability of portfolio management to catch on (or even sustain itself in New Orleans), suggests that the Ed Reform Establishment seems to lack sensible political judgment.  But if we need more evidence that Ed Reformers are out of sync with political sentiment, just look at the findings of the new Ed Next Poll (co-authored by our new faculty member, Albert Cheng).

Of course, the way people answer poll questions does not directly translate into what is likely to be politically successful or not given how important political organization and strength of sentiment are in mobilizing opinion into policy.  But opinion polls give us some idea of what sentiment is out there for organizations to try to mobilize.  And political sentiment very clearly goes against the political calculations of the Ed Reform Establishment.

For example, Ed Reform experts tell us charters are more likely to be political winners than private school choice.  But if we look at the polling, vouchers are polling 10 points ahead of charters, with universal vouchers favored by 54% compared to charters favored by 44%. Tax credit private school choice programs are even more heavily supported, despite drawing little interest from the Ed Reform Establishment.

Ed Reform experts tell us that vouchers targeted toward the disadvantaged are more likely to be politically successful than universal programs.  But if we look at the polling, universal vouchers have a 11 percentage point advantage over targeted vouchers, which are only supported by 43% of the sample.

Other darlings of the Ed Reform Establishment also do not poll well.  The establishment bet heavily that general sympathy for standards could be channeled into supporting the specific proposal of Common Core standards.  But once the abstract idea of standards becomes the concrete proposal of Common Core, support drops from 61% to 45%, which is below the Mendoza line of 50% to overcome organized political resistance.

Heavily restricting local autonomy over disciplinary policy to ensure racial equity is also strongly favored by the Ed Reform Establishment, but it is deeply unpopular with the public, including teachers.  Only 27% of the public and 28% of teachers support “federal policies that prevent schools from expelling or suspending black and Hispanic students at higher rates than other students.”  Support for this is barely higher among Hispanic (35%) and African American (42%) respondents.

Lastly, the Ed Reform Establishment is very keen on “managed” enrollment systems that consider race and income in assigning students to schools.  The public does not share this enthusiasm.  Only 18% of the public, 27% of teachers, 24% of Hispanics, and 31% of African Americans think “public school districts [should] be allowed to take the racial background of students into account when assigning students to schools”  There is even less support for considering income when assigning students to schools.

Why does the Ed Reform Establishment so badly lack an accurate read on what has political support?  I suspect that Ed Reform has increasingly become a vanity project — a way to signal virtue to each other   — rather than a movement to make realistic and beneficial changes in policy.  This poor political judgment is exacerbated by a lack of consequences for ed reformers who regularly have poor political judgment and fail.  We seem to favor accountability for teachers but don’t seem to have much of it within the reform movement.

(Note: I’ve corrected the spelling of judgment.  Judgement is accepted in British English, but is not standard usage.)


On Taking Ignorance Seriously

August 16, 2018

IMG_1529

(Guest post by Greg Forster)

I have a post at OCPA on taking ignorance seriously:

Some states managed to have success in some cases—Massachusetts’ standards reforms and Florida’s mix of school choice, exit exams, and incentives to raise test scores across demographic groups are notable examples. But the overall story was failure. We just couldn’t take these good ideas to scale…

Why was school choice the only winner? Because it takes our ignorance seriously. It doesn’t try to generalize the content of education across millions of unique children.

Throwback to my review of the evidence on Pre-K included at no extra charge!


Masters in Someone Else’s Home is No Way to Go Through Life

August 15, 2018

(Guest Post by Matthew Ladner)

In the film Gandhi a crucial scene involves a meeting with British colonial overlords. At one point a British official plays what seems to be an ultimate trump card- in essence that His Majesty has millions of Muslim subjects in India, and without British administration a civil war would break out. Gandhi’s response: yes this is a problem, but it is our problem, not yours.

This scene came to mind when I read this Houston Chronicle article detailing the Houston Independent School District narrowly avoiding a state takeover of the district. Money quote from the article:

HISD and civic leaders are expected to gather for a celebration Wednesday at Worthing High School, which has suffered dramatic academic declines in recent years amid constant leadership turnover, persistent concerns about safety and a drain of students to school choice options.

One could spend a long time just unpacking that sentence, but I for one am happy that students at this school had the opportunity to seek a different setting, making the “drain of students” frame simply mind-blowing. There is also something deeply perverse about “celebrating” at Worthing given the state of affairs there. We get to keep things the same- hoorah!?

But in the end, kind of, yes in a sad but important way.

The Texas legislature should feel no small degree of wariness about a statute they passed that might find the Texas Education Agency taking over districts and/or closing schools. I’ve seen K-12 focus groups address the closure issue and people came across as uniformly and passionately against the entire notion of government led closures based on test scores.

On district takeovers, if not for the manifest flaws of school district democracy, we could all be doing something else with our time. School district elections are low-turnout/information affairs that sadly lend themselves readily to regulatory capture by organized employee/contractor interests. The word on the street for instance is that the AFT swept the last round of HISD school board elections.

There may be ways to improve the quality of school district democracy that could be implemented from the state level. I don’t however believe that suspending democracy, even a deeply flawed one, is one of those better ideas. No not even if it is “temporary” nor even if it is “for their own good.” Winston Churchill noted “Many forms of Government have been tried, and will be tried in this world of sin and woe. No one pretends that democracy is perfect or all-wise. Indeed it has been said that democracy is the worst form of Government except for all those other forms that have been tried from time to time.”

Churchill, an old-school imperialist, probably did not have many notions in common with Gandhi but when the Venn diagrams between them overlap it is probably best to pay attention. In the end it gets back to build new, don’t reform old. A district takeover is like a nuclear artillery piece- which used to be a thing– overpowered and a danger to the person those firing it.

 


Political Bias in Education Policy Research

August 13, 2018

Image result for political bias in academia

Education policy research is not really a scientific enterprise.  If it were, the field would be equally open to accepting research of equal rigor regardless of the findings.  That is simply not the case.  Research with preferred findings is more easily published in leading journals and embraced by scholars than research supporting less favored results.

There are countless examples of this, but here is one to illustrate the point…

The Journal of Policy Analysis and Management, a top journal in our field, has just published an analysis of vouchers in Indiana based on a matching research design.  Despite the fact that matching is normally intended to produce treatment and comparison groups that are nearly identical on observed characteristics, in this study the treatment group differed significantly from the control group in their pre-treatment measure of math performance.  Specifically, the treatment group has significantly higher scores on math tests.  And the one negative effect observed by the study was on math test scores, which was roughly comparable in magnitude to the amount by which the treatment group was higher on math scores pre-treatment.  So, basically the treatment group reverted to having about the same math scores as the control group once treatment began.  This negative effect, which was really the equalizing of the matched groups, was detected the first time students enrolled in a private school and did not grow in magnitude as students persisted in private school.  One might think that if private schools really harmed math scores, that harm might compound over time, but that did not occur.

These results certainly deserve publication and ought to inform the school choice policy debate despite the obvious limitations of the matching design that failed to make the groups comparable on the one outcome measure for which a negative outcome was observed.  While worthy of publication and discussion, it is questionable whether this article deserves publication in one of the field’s top journals and even more doubtful that it should be given as much credence as some folks in the field seem willing to give it.

Corey DeAngelis and Pat Wolf have a similar school choice study based on a matching research design with similar imperfections.  It examines whether students enrolled in the Milwaukee voucher program were more likely to be accused or convicted of a crime in later years than comparable students who had attended Milwaukee’s public schools.  Students in the treatment group were matched to public school students on a number of observable characteristics, including the neighborhood in which they lived.  Despite that matching effort,  the treatment and control groups were significantly different, with the treatment group having higher reading scores and more likely to be female.  Unlike the JPAM study, neither of these variables were the same as the outcome for which they observed effects.  Controlling for observable student and parental characteristics, students who had enrolled in Milwaukee’s voucher program were significantly less likely to be accused of a crime in later years.

The defects of Corey and Pat’s study are similar to those of the JPAM study.  It also uses a matching research design, and as I have said many times before, I don’t think we should have much confidence in matching designs to produce causal inferences.  And like the other study, Corey and Pat’s matching fails to produce treatment and control groups that are similar on all observed characteristics.  But unlike the other study, Corey and Pat’s research is not being published in JPAM.  In fact, JPAM desk rejected Corey and Pat’s study, deeming it unworthy even of being sent out for review.  A number of other journals did the same and they are now struggling to get it published in any journal.  I’m convinced that if only they had found that vouchers increased criminal behavior, their piece would already be in print in a respected journal.  But because they found a positive result for vouchers, the bar is higher and editors and reviewers can rightly note the defects in the study to justify rejection.

All research has limitations that might be invoked to support rejection or overlooked to support publication.  The double-standard used when judging voucher studies with favorable or unfavorable findings is a function of political bias and is an indication that our field is much less scientific than we would like to imagine.

It’s a shame that education policy researchers are largely uninterested in this problem of political bias.  Despite considerable energy devoted to promoting many dimensions of diversity within our field, there is virtually no effort to promote ideological diversity.  My department has a few researchers who would describe themselves as conservatives (while we also have had two faculty members who describe themselves as socialists), but I suspect most departments don’t have any self-described conservatives while others have no more than one or two.

It is interesting to note that despite having a department with six endowed chair holders, half of whom have Harvard doctorates, and all of whom have impressive research records, none of us have ever been asked to serve on the editorial boards of any journals (excluding the Journal of School Choice that my colleague, Bob Maranto, edits).  We’ve tried to play a part in governing our profession, but because we are branded (sometimes incorrectly) as conservatives we have been shunned.  The composition of editorial boards shapes who reviews submissions, which shapes what is published in those journals, which shapes what people in the field imagine the research consensus to be on various issues.

There are consequences to this political bias in our field.  First, the scientific quality of research is harmed by an increasing groupthink that fails to critically examine the key assumptions, methods, and implications of much of the work being produced.  Second, research in the field has diminished credibility and policy influence because others increasingly look at the field as more ideological and less scientific.  Some of the leading people in our field regularly take to Twitter to deride policymakers and the public for failing to heed what they believe research has to say. But why should policymakers obey “science” when it is being produced by an increasingly insular group of researchers who may confuse their political agenda for science? Third, frustrated conservatives are likely to give up trying to be accepted by the dominant professional associations and journals and instead build their own parallel institutions.  The Bar Association drove out conservatives who built the Federalist Society, which now seems to be thriving more than the “mainstream” organization at exercising policy influence.

I don’t expect this piece to alter this state of affairs.  Leading scholars in our field seem quite adept at defending their prior convictions, sometimes in remarkably unscholarly ways on social media, rather than critically examining their own beliefs and behaviors.  As far as I’m concerned they can rail away, but they will be left with the kind of nasty, unscientific, and irrelevant field they seem determined to build.


Pre-K Helps Test Scores in Short Run But Hurts Them Later

July 16, 2018

Image result for jerry lewis professor

The Arnold Foundation’s Straight Talk On Evidence web site provides a very useful summary of a recently published large RCT on a state-funded pre-K program in Tennessee.  Consistent with a previous, nationally representative RCT of Head Start, this study found that students given access to government-funded pre-school by lottery initially score higher than those who lose the lottery on standardized test scores but then fare worse later.

In the TN study, treatment students score higher at the end of pre-K.  But, as the Arnold summary puts it:

At the end of third grade, the study found statistically-significant adverse effects on student math and science achievement. In math, the VPK group scored 0.12 standard deviations lower than the control group, which equates to roughly 13 percent less growth in math achievement than would be expected in the third grade year.[ii] In science, the VPK group scored 0.09 standard deviations lower than the control group, which equates to roughly 23 percent less growth in science achievement than would be expected in the third grade year.[iii]

In an effort to explain the negative longer-term result, the authors suggest that special education may be to blame.  Students admitted to the government-funded pre-K program were more likely to be labeled as needing special education services and that designation may have lowered academic expectations.  But this explanation is inconsistent with Hanushek, Kain, and Rivkin’s finding that special education tends to improve test score results.  Straight Talk at least considers the possibility that children being with family or in a non-government-funded pre-school may just be academically superior.

The hard reality is that the process of human development is complex and highly varied, so we just don’t know the optimal arrangements for all children.  Andy Smarick has an excellent piece along these lines in the Weekly Standard, suggesting that education policy experts suffer from a Hayekian information problem.  And this was also the subtext of my post last week on how parents are smarter than Technocrats.  Even when Technocrats are armed with the best science, they generally do not have enough information to centrally plan the lives of others.  This doesn’t mean that we never regulate anything.  It just means that if we do regulate we should do so with great caution and large dollops of humility because the experts are typically missing a lot of important information that the individuals they are regulating are more likely to posses.

But caution and humility are no fun, so the Arnold Foundation’s Straight Talk chooses instead to double-down on Technocracy by suggesting that the disappointing results of pre-school as shown in RCTs of both Head Start and the TN program be remedied by identifying which subset of pre-schools seem to be more effective and regulating programs toward imitating those schools:

The above findings and observations, we believe, underscore the need to reform programs such as VPK and Head Start by incorporating (i) rigorous evaluations aimed at identifying the subset of local approaches that are effective, and (ii) once such approaches are identified, strong incentives or requirements for other local program sites to adopt and faithfully implement them on a larger scale.

Keep in mind that the TN program already has regulations in place meant to ensure quality, including requiring at least 5.5 hours of instructional time per day, a cap of 20 students per classroom, a licensed teacher in each classroom, and the requirement that schools choose among a state approved set of curricula.  Also keep in mind that short-term test scores, which are the most common tool by which regulators monitor quality, showed positive results.

If these regulatory practices are insufficient to avoid harming students over the medium term, why would Straight Talk believe that doubling down on the Technocratic approach would make things better?  It would be nice if they at least considered the possibility that we are suffering from a Hayekian information problem and may be unable to devise optimal arrangements for education.


Parents are Smart. Technocrats are Dumb

July 12, 2018

Image result for jerry lewis professor

The technocratic brand of ed reform that is currently dominant is based on the premise that policy elites, guided by science, need to ensure school quality.  Parents should have choices, but they should only choose among quality options.  Mostly using test scores, technocrats believe they can identify quality schools and quality-promoting educational practices, which should over-ride parental preferences about which schools and practices offer a quality education.

A new study by Diether W. Beuermann and C. Kirabo Jackson suggests that parents may be better at detecting which schools promote long-term positive outcomes for their children than technocrats guided by short-term test scores.  They examine the school system in the Barbados in which parents seek admission for their children into schools they prefer, but those schools use test-score cut-offs to determine which students gain admissions.  The cut-offs create a discontinuity that allows for a rigorous causal identification of whether students who barely gain admission to a desired school have different outcomes than those with barely lower lower test scores who are denied admission.

They find that test score gains are no greater for students who were admitted to the schools their parents preferred than those not admitted.  For boys there are some signs that the effect on test score gains may actually be negative.  But when they look at longer-term outcomes, including educational attainment, employment, and earnings, they find significant benefits for students who were admitted to the schools the parents preferred.  These positive effects were driven mostly by gains for girls.  When they explore mechanisms for why these gains occurred, they find a significant reduction in teen motherhood for girls admitted to preferred schools, which contributed to their educational attainment and later employment and earnings.  They also found that both boys and girls experienced significant long-term health benefits as measured by a healthy BMI, regular exercise, and dental check-ups if they gained admission to the schools their parents preferred.  The researchers conclude: “This suggests that preferred schools may promote productive habits and attitudes that are not measured by test scores but contribute to overall well-being. This may represent a significant, previously undocumented, return to school quality.”

So, parents, on average, could detect important aspects of school quality that technocrats guided by test scores would get wrong.  Technocrats would conclude that the schools that parents prefer do nothing to improve student outcomes because test scores don’t rise or even go down when students get into the school their parents want.  But parents are smarter than the technocrats.  They prefer schools that improve long-term outcomes for their children.  Specifically parents seem to be able to choose schools that are more effective in developing the “character” of their children, making the students less likely to get pregnant as teens and more likely to be engaged in positive health behaviors later.  For boys this may not make a big difference in the labor market (although it does not harm those outcomes), but for girls these health improvements seem to drive higher educational attainment, employment, and earnings.

This study is consistent with a long line of research that finds a disconnect between short-term test score outcomes and long-term life outcomes, as described in a recent meta-analysis by my colleagues, Mike McShane, Pat Wolf, and Collin Hitt.  It’s amazing to me how champions of the technocratic approach continue to have faith that they have access to scientific tools to identify school quality that less well-informed parents lack despite the growing body of scientific evidence that demonstrates the very real defects of the technocratic approach.  Despite their daily hymns of praise to science, the technocrats don’t seem very scientific at all.

 


Religious Left Baptizes the Blob

July 5, 2018

classroom-2787754_640

(Guest post by Greg Forster)

In my latest for OCPA, I look at how Oklahoma’s religious left is baptizing the blob, including support for the teacher strike.

It’s not my position that religious leaders should have nothing to say about education policy:

It’s true that America’s great experiment in religious freedom implies our public policy can be based on shared moral commitments even if we disagree about the ultimate cosmic basis of those commitments. But as George Washington rightly pointed out in his farewell address, we can’t talk only about the morals of public policy and ignore the religious foundations of the morality upon which we draw. For if the foundations are neglected, the building collapses.

But if religious leaders are going to speak about education policy, they should make a serious theological argument and not just parrot the political talking points of secular special-interest groups. Otherwise they end up captive to political manipulators. This is exactly what happened to the religious right:

As a matter of fact, I’ve spent almost 10 years speaking out against the ideological captivity of the religious right. I appreciate that the fight for the sanctity of human life and other issues has accomplished some good. But the larger effect of the religious right movement was to push churches to become voter registration offices of the Republican Party. As it became clear what was going on, this did incalculable damage to the religious credibility of the churches involved. We are still living in the disastrous aftermath, as huge portions of our culture have disconnected themselves from faith entirely.

So I’m only playing fair when I say that I see the same dangerous sellout in the efforts of Oklahoma’s religious left to baptize the blob. The pronouncements of Oklahoma’s religious left on education don’t bring any theological light to the public policy questions. They’re not saying anything the secular left isn’t saying. They’re just pasting Bible verses on self-interested interest group politics. Organizing events and statements to support a secular special interest’s demand for money, parroting its secular talking points, doesn’t become a spiritual discipline because you do it with a clerical collar on—quite the reverse.

There are, in fact, serious theological arguments to be made on education policy. I’ve participated in some of them, including my response to theological arguments from the religious left as well as theological arguments from the religious right. So I welcome – though I often disagree on the merits – real theological arguments from the religious left and right. What’s alarming is when religious leaders make themselves tools of secular selfishness in the name of, yet to the detriment of, better schools for kids.


More Fake Statistics Hide Prevalence of Bullying in District Schools

June 27, 2018

(Guest Post by Jason Bedrick)

One of the many issues surfaced in Max Eden’s recent exposé on a New York City high school was how city officials’ insistence on keeping suspension rates down created incentives for lower-level bureaucrats to hide problems rather than address them. In an earlier report, he noted that under the DeBlasio’s administration’s new discipline policies, the NYC School Survey showed that “teachers report less order and discipline, and students report less mutual respect among their peers, as well as more violence, drug and alcohol use, and gang activity.” Despite this, DeBlasio declared that the city’s district schools experienced “the safest [year] on record.” What accounts for the disparity? The answer appears to be juked statistics.

In the wake of Eden’s exposé, many questioned how widespread this problem is. That’s a question researchers should set out promptly to address, but evidence from New Hampshire suggests that New York is far from an anomaly:

As told by schools’ self-reported statistics, the story of bullying in New Hampshire’s public schools is one of great progress. Since the signing of a landmark anti-bullying law, the number of incidents recorded by schools has dropped by more than half, from 5,561 in the 2010-2011 school year to 2,233 in 2016-2017, according to Department of Education data.

But advocates and state officials say those numbers belie the reality for Skylar and other students. More than a fifth of Granite State high-schoolers, for example, reported in a 2017 survey that they were bullied on school property during the previous year.

The Youth Risk Behavior Survey is administered annually to students across the country. Since 2009, the rate at which New Hampshire high school students say they have been bullied has stayed the same – between 21 and 22 percent – even as schools report more than 50 percent reductions in claims of bullying.

Yet again, it appears that school officials are working harder to hide incidents of bullying than address them:

The rate at which schools investigate students’ claims and find actual incidents of bullying has also dropped dramatically at the high school level. In 2010-2011, high schools confirmed bullying in 58 percent of reported incidents. Seven years later, it has dropped to 29 percent.

Some schools have put a lot of effort into stopping bullying, advocates say, but they believe the discrepancies in the data are evidence that some schools are exploiting weaknesses in the state’s law to under-report and underinvestigate claims of bullying.

The new spotlight on juked bullying statistics comes in response to two cases of student suicides over bullying in a state that has a smaller population than many cities. In both cases, parents argued that the schools didn’t do enough to protect their children from bullying.

Education officials in New Hampshire should work swiftly to correct perverse incentives and produce more accurate accounts of the level of bullying in the district school system, and schools should step up their efforts to combat bullying. In the meantime, bullied students should get access to educational choice options to provide an escape hatch from their tormentors.

 

 


Responding on Pre-K

June 27, 2018

469EADCA-45D4-4475-8720-5953C9D0B5C2

(Guest post by Greg Forster)

The Oklahoman has run a quasi-response to my recent op-ed on whether Pre-K is worth the investment. I say “quasi-response” because the author, Craig Knutson, says he’s not trying to refute what I wrote, just putting his own two cents on the table – which is fair enough.

I appreciate that Knutson agrees Pre-K has to produce an “ROI” (his term) sufficient to justify the investment. Unfortunately, the evidence he provides doesn’t establish how large the ROI of Pre-K is:

  • He says he looked at “five distinct reports and programs,” but doesn’t tell us what they are, so we can’t evaluate either his characterization of their findings or the quality of their methods.
  • He says “all of the studies concluded that returns on investment were greatest among high-risk demographics,” which doesn’t tell us how great the returns were.
  • He says “Oklahoma certainly has a disproportionately large number of high-risk families and children.” This admittedly would depend on how those terms are defined, but it’s hard to think of any reasonable definition by which this assertion would be true – assuming “disproportionately” means “disproportionately compared to other U.S. states,” and I don’t know what other basis of comparison would be relevant. Oklahoma has plenty of struggling people, but not a “disproportionate” number of them as compared with, say, New York or Mississippi.
  • He says “another aspect of these programs was that each was voluntary.” He stresses that this means the programs were selected by parents because they’re valuable and produce returns. He thinks this supports his argument, because it’s evidence these programs are valuable. But if the parents themselves aren’t paying for the programs, then their choice by itself doesn’t establish ROI on a cost basis. More importantly, the public question in Oklahoma right now is whether Pre-K should be expanded. ROI will inevitably go down (as costs go up and benefits go down) when we stretch beyond families who have chosen Pre-K proactively, to rope in families that have to be goaded – or perhaps forced – into attending.
  • He quotes James Heckman saying that “high quality” programs produce benefits, without defining “high quality” or telling us how large the benefits are.
  • The only specific study Knutson cites is this one, which studies a highly targeted program for a specific population that doesn’t represent what Pre-K looks like for the general population. Knutson not only does not inform the reader that the study is looking at a targeted program, he actually protrays it as if it were a study of “high quality” Pre-K programs generally, serving the general population: “But Heckman’s latest research, ‘The Lifecycle Benefits of an Influential Early Childhood Program,’ shows that high-quality programs can deliver a return on investment of 13 percent per year.”

Other than that, there were no problems with it.


The Gates Effective Teaching Initiative Fails to Improve Student Outcomes

June 21, 2018

Rand has released its evaluation of the Gates Foundation’s Intensive Partnerships for Effective Teaching initiative and the results are disappointing.  As the report summary describes it, “Overall, however, the initiative did not achieve its goals for student achievement or graduation, particularly for LIM [low income minority] students.” But in traditional contract-research-speak this summary really under-states what they found.  You have to slog through the 587 pages of the report and 196 pages of the appendices to find that the results didn’t just fail to achieve goals, but generally were null to negative across a variety of outcomes.

Rand examined the Gates effort to develop new measures of teacher effectiveness and align teacher employment, compensation, and training practices to those measures of effectiveness in three school districts and a handful of charter management organizations.  According to the report, “From 2009 through 2016, total IP [Intensive Partnership] spending (i.e., expenditures that could be directly associated with the components of the IP initiative) across the seven sites was $575 million.”  In addition, Rand estimates that the cost of staff time to conduct the evaluations to measure effectiveness totaled about $73 million in 2014-15, a single year of the program.  Assuming that this staff time cost was the same across the 7 years of the program they examined, the total cost of this initiative exceeded $1 billion.  The Gates Foundation paid $212 million of this cost, with the rest being covered primarily by “site funds,” which I believe means local tax dollars.  The federal government also contributed a significant portion of the funding.

So what did we get for $1 billion?  Not much.  One outcome Rand examined was whether the initiative made schools more likely to hire effective teachers.  The study concluded:

Our analysis found little evidence that new policies related to recruitment, hiring, and new-teacher support led to sites hiring more-effective teachers. Although the site TE [teacher effectiveness] scores of newly hired teachers increased over time in some sites, these changes appear to be a result of inflation in the TE measure rather than improvements in the selection of candidates. We drew this conclusion because we did not observe changes in effectiveness as measured by study-calculated VAM scores, and we observed similar improvements in the site TE scores of more-experienced teachers.

Another outcome was the increased retention of effective teachers:

However, we found little evidence that the policies designed, in whole or in part, to improve the level of retention of effective teachers had the intended effect. The rate of retention of effective teachers did not increase over time as relevant policies were implemented (see the leftmost TE column of Table S.1). A similar analysis based only on measures of value added rather than on the site-calculated effectiveness composite reached the same conclusion (see the leftmost VAM column of Table S.1).

Did the program improve teacher effectiveness overall and specifically access by low income minority students to effective teachers?

…An analysis of the distribution of TE based on our measures of value added found that TE did not consistently improve in mathematics or reading in the three IP districts. There was very small improvement in effectiveness among mathematics teachers in HCPS [Hillsborough County] and SCS [Shelby County] and larger improvement among reading teachers in SCS, but there were also significant declines in  effectiveness among reading teachers in HCPS and PPS [Pittsburgh]. In addition, in HCPS, LIM students’ overall access to effective teaching and LIM students’ school-level access to effective teaching declined in reading and mathematics during the period of the initiative (see Table S.2). In the other districts, LIM students did not have consistently greater access to effective teaching before, during, or after the IP initiative.

And was there an overall change as a result of the program in student achievement and graduation rates?

Our analyses of student test results and graduation rates showed no evidence of widespread positive impact on student outcomes six years after the IP initiative was first funded in 2009–2010. As in previous years, there were few significant impacts across grades and subjects in the IP sites.

Here I think the report is casting a more positive spin on the results than their findings show.  Check out this summary of results from each of the sites:

I see a lot more red (significant and negative effects) than green (significant and positive). The report’s overall conclusion is technically true only because it focuses just on the last year (2014-15) and because it examines each of these 4 sites separately.  A combined analysis across sites and across time, which they don’t provide, would likely show a significant and negative overall effect on test scores.

The attainment effects are also mostly negative.  To find the attainment results at all, you have to dive into a separate appendix file.  There you will see that Pittsburgh experienced a decrease in dropout rates of between 1.3 and 3.5%, depending on the year, which is a positive result.  But Shelby County showed a significant decrease in graduation rates in every year but one.  While dropout, unlike grad rate,  is an annualized measure, the decrease in Shelby County’s graduation rate was as large as 15.7%.  The charter schools also showed a significant decrease in graduation rates as a result of the program in every year but one, with the decline as large as 6.6%.  And Hillsborough experienced a significant increase in dropout rate in one year of about 1.5%.  In three of the four sites examined there were significant, negative effects on attainment. In one site there were positive effects on attainment.

The difference in difference analysis that Rand is using is not perfect at isolating causal effects.  And as the report notes, comparison districts were also sometimes implementing similar reform strategies as the Partnership sites.  But you would expect that the injection of several hundred million dollars and considerable expert attention would improve implementation in the Partnership districts, so the comparison is still informative.  Besides, the fact that some comparison districts were pursuing some of the same reforms does not explain the splattering of red (negative and significant effects) we see.

As Mike McShane and I note in the book we recently edited on failure in education reform, there is nothing inherently wrong with trying a reform and having it fail.  The key is learning from failure so that we avoid repeating the same mistakes.  It is pretty clear that the Gates effective teaching reform effort failed pretty badly.  It cost a fortune.  It produced significant political turmoil and distracted from other, more promising efforts.  And it appears to have generally done more harm than good with respect to student achievement and attainment outcomes.

The Rand report draws at least one appropriate lesson from this experience:

A favorite saying in the educational measurement community is that one does not fatten a hog by weighing it. The IP initiative might have failed to achieve its goals because the sites were better at implementing measures of effectiveness than at using them to improve student outcomes. Contrary to the developers’ expectations, and for a variety of reasons described in the report, the sites were not able to use the information to improve the effectiveness of their existing teachers through individualized PD, CLs, or coaching and mentoring.