Rand has released its evaluation of the Gates Foundation’s Intensive Partnerships for Effective Teaching initiative and the results are disappointing. As the report summary describes it, “Overall, however, the initiative did not achieve its goals for student achievement or graduation, particularly for LIM [low income minority] students.” But in traditional contract-research-speak this summary really under-states what they found. You have to slog through the 587 pages of the report and 196 pages of the appendices to find that the results didn’t just fail to achieve goals, but generally were null to negative across a variety of outcomes.
Rand examined the Gates effort to develop new measures of teacher effectiveness and align teacher employment, compensation, and training practices to those measures of effectiveness in three school districts and a handful of charter management organizations. According to the report, “From 2009 through 2016, total IP [Intensive Partnership] spending (i.e., expenditures that could be directly associated with the components of the IP initiative) across the seven sites was $575 million.” In addition, Rand estimates that the cost of staff time to conduct the evaluations to measure effectiveness totaled about $73 million in 2014-15, a single year of the program. Assuming that this staff time cost was the same across the 7 years of the program they examined, the total cost of this initiative exceeded $1 billion. The Gates Foundation paid $212 million of this cost, with the rest being covered primarily by “site funds,” which I believe means local tax dollars. The federal government also contributed a significant portion of the funding.
So what did we get for $1 billion? Not much. One outcome Rand examined was whether the initiative made schools more likely to hire effective teachers. The study concluded:
Our analysis found little evidence that new policies related to recruitment, hiring, and new-teacher support led to sites hiring more-effective teachers. Although the site TE [teacher effectiveness] scores of newly hired teachers increased over time in some sites, these changes appear to be a result of inflation in the TE measure rather than improvements in the selection of candidates. We drew this conclusion because we did not observe changes in effectiveness as measured by study-calculated VAM scores, and we observed similar improvements in the site TE scores of more-experienced teachers.
Another outcome was the increased retention of effective teachers:
However, we found little evidence that the policies designed, in whole or in part, to improve the level of retention of effective teachers had the intended effect. The rate of retention of effective teachers did not increase over time as relevant policies were implemented (see the leftmost TE column of Table S.1). A similar analysis based only on measures of value added rather than on the site-calculated effectiveness composite reached the same conclusion (see the leftmost VAM column of Table S.1).
Did the program improve teacher effectiveness overall and specifically access by low income minority students to effective teachers?
…An analysis of the distribution of TE based on our measures of value added found that TE did not consistently improve in mathematics or reading in the three IP districts. There was very small improvement in effectiveness among mathematics teachers in HCPS [Hillsborough County] and SCS [Shelby County] and larger improvement among reading teachers in SCS, but there were also significant declines in effectiveness among reading teachers in HCPS and PPS [Pittsburgh]. In addition, in HCPS, LIM students’ overall access to effective teaching and LIM students’ school-level access to effective teaching declined in reading and mathematics during the period of the initiative (see Table S.2). In the other districts, LIM students did not have consistently greater access to effective teaching before, during, or after the IP initiative.
And was there an overall change as a result of the program in student achievement and graduation rates?
Our analyses of student test results and graduation rates showed no evidence of widespread positive impact on student outcomes six years after the IP initiative was first funded in 2009–2010. As in previous years, there were few significant impacts across grades and subjects in the IP sites.
Here I think the report is casting a more positive spin on the results than their findings show. Check out this summary of results from each of the sites:
I see a lot more red (significant and negative effects) than green (significant and positive). The report’s overall conclusion is technically true only because it focuses just on the last year (2014-15) and because it examines each of these 4 sites separately. A combined analysis across sites and across time, which they don’t provide, would likely show a significant and negative overall effect on test scores.
The attainment effects are also mostly negative. To find the attainment results at all, you have to dive into a separate appendix file. There you will see that Pittsburgh experienced a decrease in dropout rates of between 1.3 and 3.5%, depending on the year, which is a positive result. But Shelby County showed a significant decrease in graduation rates in every year but one. While dropout, unlike grad rate, is an annualized measure, the decrease in Shelby County’s graduation rate was as large as 15.7%. The charter schools also showed a significant decrease in graduation rates as a result of the program in every year but one, with the decline as large as 6.6%. And Hillsborough experienced a significant increase in dropout rate in one year of about 1.5%. In three of the four sites examined there were significant, negative effects on attainment. In one site there were positive effects on attainment.
The difference in difference analysis that Rand is using is not perfect at isolating causal effects. And as the report notes, comparison districts were also sometimes implementing similar reform strategies as the Partnership sites. But you would expect that the injection of several hundred million dollars and considerable expert attention would improve implementation in the Partnership districts, so the comparison is still informative. Besides, the fact that some comparison districts were pursuing some of the same reforms does not explain the splattering of red (negative and significant effects) we see.
As Mike McShane and I note in the book we recently edited on failure in education reform, there is nothing inherently wrong with trying a reform and having it fail. The key is learning from failure so that we avoid repeating the same mistakes. It is pretty clear that the Gates effective teaching reform effort failed pretty badly. It cost a fortune. It produced significant political turmoil and distracted from other, more promising efforts. And it appears to have generally done more harm than good with respect to student achievement and attainment outcomes.
The Rand report draws at least one appropriate lesson from this experience:
A favorite saying in the educational measurement community is that one does not fatten a hog by weighing it. The IP initiative might have failed to achieve its goals because the sites were better at implementing measures of effectiveness than at using them to improve student outcomes. Contrary to the developers’ expectations, and for a variety of reasons described in the report, the sites were not able to use the information to improve the effectiveness of their existing teachers through individualized PD, CLs, or coaching and mentoring.