New Grad Rate Study in Milwaukee

January 10, 2011

School Choice Wisconsin has released a new study by the University of Minnesota’s John Robert Warren of graduation rates for the voucher and public school systems in Milwaukee.  Here’s the highlight from the release:

Based on seven years of data, Professor Warren estimates that the graduation rate for students in Milwaukee’s choice program was about 18% higher than for students in MPS.  Had MPS achieved the same graduation rate as students in the MPCP, an additional 3,939 Milwaukee students would have graduated from 2003 to 2009.  Based on findings in separate research reported by the Milwaukee Journal Sentinel, the annual impact from these additional graduates would have been about $4.2 million in extra tax revenue and $24.9 million in additional personal income.

Warren’s research shows a general pattern of growth in Milwaukee graduation rates.  From 2003 to 2009 the MPS rate grew from 49% to 70%.  For the MPCP the rate grew from 63% to 82%.

Of course, this is not a causal analysis.  We do not know (and the study does not claim) that the higher grad rate among voucher students is caused by the program.  A forthcoming analysis by the University of Arkansas’  School Choice Demonstration Project, led by my colleague, Patrick Wolf,  should be able to address that issue.

But the this descriptive report is nevertheless encouraging.  Not only do voucher students graduate at higher rates than MPS students, but both sectors have been improving their graduation rates.  That finding is consistent with a scenario in which choice and competition are improving outcomes for all students — public and private — in Milwaukee.

 


Rankings Revised

January 6, 2011

Rick Hess along with Daniel Lautzenheiser have devised a ranking of the “public presence” of education academics.  They developed a 7 itemscoring rubric [that] reflects a given scholar’s body of academic work—encompassing books, articles, and the degree to which these are cited—as well as their footprint on the public discourse in 2010. ”

There is always something arbitrary and crappy about these rankings, but Rick is right when he argues, “For all their imperfections, I think these [ranking] systems convey real information—and do an effective job of sparking discussion (about questions that are variously trivial and substantial).”  Recognizing that these kinds of rankings are part recreation and part reality, I’ve made a slightly revised ranking presented below (with help from Misty Newcomb).

One of the problems with the ranking Daniel and Rick developed is that it combines some measures that accumulate over one’s career with other measures that only count accomplishments in the last year.  The career measures, Google Scholar and books published, will tend to be higher for people who have had longer careers.  Given that the ranking is meant to capture the current influence of education academics, these career items are biased in favor of senior scholars whose work may have been influential in the past, but less so in the present.

A more junior colleague pointed out this distortion to me, so I have tried to standardize the Google Scholar and book measures so that those with longer careers would have no particular advantage.  In particular, I calculated the sum of the two “career measures” — Google Scholar and books published.  Then I divided that sum by the years since the scholar received his or her terminal degree.  And to ensure that books and articles would still have the same weight in the overall score, I multiplied by the mean number of years since degrees were earned, about 23.2.

In making this adjustment I am assuming that every scholar would maintain the same rate of book and article productivity over his or her entire career.  So, the book and article “public presence” in the past year would be in proportion to the total book and article production per year over an entire career.

I make no changes to the 5 other measures in Daniel and Rick’s ranking: current Amazon sales as well as mentions in the education press, blogs, newspapers, and Congressional Record.  All of those measures reflect current “public presence.”  Adding the adjusted two career measures to these annual measures we get an adjusted total score.

Making the adjustment for length of career does not alter who is at the very top of the rankings.  As you can see below, Diane Ravitch and Linda Darling-Hammond still rule the roost.  But there are some significant changes below that, where more junior scholars jump in the rankings and more senior scholars drop.  For example, Martin West leaps to 10th place from his previous ranking of 69th, surpassing his mentor, Paul Peterson, who drops from 5th to 11th.  Roland Fryer moves up to 3rd from 11th.  Jacob Vigdor rises to 16th from 43rd.  Susanna Loeb goes to 18th from 49th.  Matthew Springer rises to 29th from 74th.  And Brian Jacob, Jonah Rockoff, and Sara Goldrick-Rab all jump almost 30 places.

On the other hand, some more senior scholars decline significantly in their public presence ranking once we make this adjustment.  Gene Glass sinks from 20th to 50th.  Henry Levin falls from 17th to 52nd.  David Berliner drops from 19th to 57th.  Kenneth Zeichner moves from 30th to 62nd .

These changes make sense and I think improve Rick and Daniel’s ranking.  Hotshot researchers like Roland Fryer, Jacob Vigdor, Susanna Loeb, Matthew Springer, Brian Jacob, Jonah Rockoff, and Sara Goldrick-Rab are having a large impact on current education policy discussions even though their careers have not been long enough to accumulate a longer list of books and articles.  The original ranking shortchanged these scholars in measuring their current “public presence.”

At the same time, more senior scholars, like Gene Glass, Hank Levin, David Berliner, and Kenneth Zeichner may have been given too much credit by the old ranking system for books and articles that were influential in the past but do not give them as much of a public presence in recent policy debates.

Of course, of greatest interest to me was what happened to my ranking.  I moved up to 21st from 39th.  This must be a better ranking.

Click on the images below to see the original and adjusted results for all 89 education academics that Rick and Daniel included in their “super-sized” ranking.  Have fun and, as David Letterman would say, please… no wagering.


The Education Reform Book is Dead

January 5, 2011

I have a new piece in 10th anniversary edition of Education Next reviewing education reform books of the last decade.  My somewhat over-stated thesis is that the education reform book is dead — that books don’t have nearly as much influence in shaping the education policy agenda as they used to.

Here is a taste:

Why is it so difficult to identify a book that embodies the incentive-based reforms of the decade and relatively easy to list books that argue against them? One reason is that books have lost their place as primary vehicles for shaping education policy. Just like in other realms, books are being displaced by other media.

A film like Waiting for “Superman” can have considerably more influence over education policy than any book. Articles and reports can be released on the Internet as soon as they are written. Even blogs are swaying education policy discussions to a greater extent than books. The power of blogs is especially clear when it comes to debating the merits of the research on various policy questions. There is little point in writing a book that reviews and adjudicates research findings when online articles and blog posts can do the same thing and be available within days or even hours.

The lack of policy influence that is attributable to recent education-reform books is not for lack of sales. Some have even become national best sellers. The problem is that policymakers and other elites are less likely to be among their readers. Instead, the buyers increasingly seem to be those actively participating in education reform debates; the people actually shaping policy appear to be paying relatively little attention.

For example, teachers and others hostile to incentive-based reforms consume works by Diane Ravitch, Linda Darling-Hammond, and Tony Wagner to affirm their worldview. These books are not setting the agenda for policymakers. They are feeding the resentment of practitioners to an education reform agenda that draws its inspiration from nonbook sources and is advancing despite the hostility stirred by such books. These best-selling volumes are, in the words of their intellectual nemesis, “standing athwart history, yelling stop.”


Jeb Kicks Off the New Year Right

January 3, 2011

Jeb Bush has an op-ed in today’s Wall Street Journal that gets the new year off to the right start.  Here’s a taste:

For the last decade, Florida has graded schools on a scale of A to F, based solely on standardized test scores. When we started, many complained that “labeling” a school with an F would demoralize students and do more harm than good. Instead, it energized parents and the community to demand change from the adults running the system. School leadership responded with innovation and a sense of urgency. The number of F schools has since plummeted while the number of A and B schools has quadrupled.

Another reform: Florida ended automatic, “social” promotion for third-grade students who couldn’t read. Again, the opposition to this hard-edged policy was fierce. Holding back illiterate students seemed to generate a far greater outcry than did the disturbing reality that more than 25% of students couldn’t read by the time they entered fourth grade. But today? According to Florida state reading tests, illiteracy in the third grade is down to 16%.

Rewards and consequences work. Florida schools that earn an A or improve by a letter grade are rewarded with cash—up to $100 per pupil annually. If a public school doesn’t measure up, families have an unprecedented array of other options: public school choice, charter schools, vouchers for pre-K students, virtual schools, tax-credit scholarships, and vouchers for students with disabilities.

Choice is the catalytic converter here, accelerating the benefits of other education reforms. Almost 300,000 students opt for one of these alternatives, and research from the Manhattan Institute, Cornell and Harvard shows that Florida’s public schools have improved in the face of competition provided by the many school-choice programs.

Florida’s experience busts the myth that poverty, language barriers, absent parents and broken homes explain failure in school. It is simply not true. Our experience also proves that leadership, courage and an unwavering commitment to reform—not demographics or demagoguery—will determine our destiny as a nation.


On Hiatus

December 19, 2010

Everything will get very quiet in the education world until the new year, so we’ll be taking a break from blogging for the next week or so.  See you in 2011.


Drill and Kill Kerfuffle

December 16, 2010

The reaction of New York Times reporter, Sam Dillon, and LA Times reporter, Jason Felch,  to my post on Monday about erroneous claims in their coverage of a new Gates report could not have been more different.  Felch said he would look into the issue, discovered that the claimed negative relationship between test prep and value-added was inaccurate, and is now working on a correction with his editors.

Sam Dillon took a very different tack.  His reaction was to believe that the blog post was “suggesting on the internet that I had misinterpreted an interview, and then you repeated the same thing about the Los Angeles Times. That was just a sloppy and irresponsible error.”  I’m not sure how Dillon jumps to this thin-skinned defensiveness when I clearly said I did not know where the error was made: “I don’t know whether something got lost in the translation between the researchers and Gates education chief, Vicki Phillips, or between her and Sam Dillon at the New York Times, but the article contains a false claim that needs to be corrected before it is used to push changes in education policy and practice.

But more importantly, Dillon failed to check the accuracy of the disputed claim with independent experts.  Instead, he simply reconfirmed the claim with Gates officials: “For your information, I contacted the Gates Foundation after our correspondence and asked them if I had misquoted or in any way misinterpreted either Vicki Phillips, or their report on their research. They said, ‘absolutely not, you got it exactly right.'”

He went on to call my efforts to correct the claim “pathetic, sloppy, and lazy, and by the way an insult.”  I guess Dillon thinks that being a reporter for the New York Times means never having to say you’re sorry — or consult independent experts to resolve a disputed claim.

If Dillon wasn’t going to check with independent experts, I decided that I should — just to make sure that I was right in saying that the claims in the NYT and LAT coverage were unsupported by the findings in the Gates report.

Just to review, here is what Dillon wrote in the New York Times: “One notable early finding, Ms. Phillips said, is that teachers who incessantly drill their students to prepare for standardized tests tend to have lower value-added learning gains than those who simply work their way methodically through the key concepts of literacy and mathematics.”  And here is what Jason Felch wrote in the LA Times: ““But the study found that teachers whose students said they ‘taught to the test’ were, on average, lower performers on value-added measures than their peers, not higher.”  And the correlations in the Gates report between test student reports of test prep and value-added on standardized tests were all positive: “We spend a lot of time in this class practicing for the state test.” (ρ=0.195), “I have learned a lot this year about the state test.” (ρ=0.143), “Getting ready for the state test takes a lot of time in our class.” ( ρ=0.103).  The report does not actually contain items that specifically mention “drill,”work their way methodically through the key concepts of literacy and mathematics,” or “taught to the test,” but I believe the reporters (and perhaps Gates officials) are referencing the test prep items with these phrases.

I sent links to the coverage and the Gates report to a half-dozen leading economists to ask if the claims mentioned above were supported by the findings.  The following reply from Jacob Vigdor, an economist at Duke, was fairly representative of what they said even if it was a bit more direct than most:

I looked carefully at the report and come to the same conclusion as you: these correlations are positive, not negative.  The NYT and LAT reports are both plainly inconsistent with what is written in the report.  A more accurate statement would be along the lines of “test preparation activities appear to be less important determinants of value added than [caring teachers, teacher control in the classroom, etc].”  But even this statement is subject to the caveat that pairwise correlations don’t definitively prove the importance of one factor over another.  Maybe the reporters are describing some other analysis that was not in the report (e.g., regression results that the investigators know about but do not appear in print), but even in that case they aren’t really getting the story right.  Even in that scenario, the best conclusion (given positive pairwise correlations and a hypothetically negative regression coefficient) would be that teachers who possess all these positive characteristics tend to emphasize test preparation as well.

Put another way, it’s alway good to have a caring teacher who is in control of the classroom, makes learning fun, and demands a lot of her students.  Among the teachers who share these characteristics, the best ones (in terms of value added) appear to also emphasize preparation for standardized tets.  I say “appear” because one would need a full-fledged multivariate regression analysis, and not pairwise correlations, to determine this definitively.

Another leading economist, who preferred not to be named, wrote: “I looked back over the report and I think you are absolutely right!”  I’m working on getting permission to quote others, but you get the idea.

In addition to confirming that a positive correlation for test prep items means that it contributes to value-added, not detracts from it, several of these leading economists emphasized the inappropriateness of comparing correlations to draw conclusions about whether test prep contributes to value-added any more or less than other teacher practices observed by students.  They noted that any such comparison would require a multivariate analysis and not just a series of pairwise correlations.  And they also noted that any causal claim about the relative effectiveness of test prep would require some effort to address the endogeneity of which teachers engage in more test prep.

As David Figlio, an economist at Northwestern University, put it:

You’re certainly correct here.  A positive pairwise correlation means that these behaviors are associated with higher performance on standardized tests, not lower performance.  The only way that it could be an accurate statement that test prep is causing worse outcomes would be if there was a negative coefficient on test prep in a head-to-head competition in a regression model — though even then, one would have to worry about endogeneity: maybe teachers with worse-performing students focus more on test prep, or maybe lower-performing students perceive test prep to be more oppressive (of course, this could go the other way as well.)  But that was not the purpose or intent of the report.  The report does not present this as a head-to-head comparison, but rather to take a first look at the correlates between practice measures and classroom performance.

There was no reason for this issue to have developed into the controversy that it has. The coverage contains obvious errors that should have been corrected quickly and clearly, just as Jason Felch is doing.   Tom Kane, Vicki Phillips, and other folks at Gates should have immediately issued a clarification as soon as they were alerted to the error, which was on Monday.

And while I did not know where the error occurred when I wrote the blog post on Monday, the indications now are that there was a miscommunication between the technical people who wrote the report and non-technical folks at Gates, like Vicki Phillips and the pr staff.  In other words, Sam Dillon can relax since the mistake appears to have originated within Gates (although Dillon’s subsequent defensiveness, name-calling, and failure to check with independent experts hardly bring credit to the profession of journalism).

The sooner Gates issues a public correction, the sooner we can move beyond this dispute over what is actually a sidebar in their report and focus instead on the enormously interesting project on which they’ve embarked to improve measures of teacher effectiveness.  An apology from Sam Dillon would be also nice but I’m not holding my breath.



False Claim on Drill & Kill

December 13, 2010

The Gates Foundation is funding a $45 million project to improve measures of teacher effectiveness.  As part of that project, researchers are collecting information from two standardized tests as well as surveys administered to students and classroom observations captured by video cameras in the classrooms.  It’s a big project.

The initial round of results were reported last week with information from the student survey and standardized tests.  In particular, the report described the relationship between classroom practices, as observed by students, and value-added on the standardized tests.

The New York Times reported on these findings Friday and repeated the following strong claim:

But now some 20 states are overhauling their evaluation systems, and many policymakers involved in those efforts have been asking the Gates Foundation for suggestions on what measures of teacher effectiveness to use, said Vicki L. Phillips, a director of education at the foundation.

One notable early finding, Ms. Phillips said, is that teachers who incessantly drill their students to prepare for standardized tests tend to have lower value-added learning gains than those who simply work their way methodically through the key concepts of literacy and mathematics. (emphasis added)

I looked through the report for evidence that supported this claim and could not find it.  Instead, the report actually shows a positive correlation between student reports of “test prep” and value added on standardized tests, not a negative correlation as the statement above suggests.  (See for example Appendix 1 on p. 34.)

The statement “We spend a lot of time in this class practicing for [the state test]” has a correlation of  0.195 with the value added math results.  That is about the same relationship as “My teacher asks questions to be sure we are following along when s/he is teaching,” which is 0.198.  And both are positive.

It’s true that the correlation for “Getting ready for [the state test] takes a lot of time in our class” is weaker (0.103) than other items, but it is still positive.  That just means that test prep may contribute less to value added than other practices, but it does not support the claim that  “teachers who incessantly drill their students to prepare for standardized tests tend to have lower value-added learning gains…”

In fact, on page 24, the report clearly says that the relationship between test prep and value-added on standardized tests is weaker than other observed practices, but does not claim that the relationship is negative:

The five questions with the strongest pair-wise correlation with teacher value-added were: “Students in this class treat the teacher with respect.” (ρ=0.317), “My classmates behave the way my teacher wants them to.”(ρ=0.286), “Our class stays busy and doesn’t waste time.” (ρ=0.284), “In this class, we learn a lot almost every day.”(ρ=0.273), “In this class, we learn to correct our mistakes.” (ρ=0.264) These questions were part of the “control” and “challenge” indices. We also asked students about the amount of test preparation they did in the class. Ironically, reported test preparation was among the weakest predictors of gains on the state tests: “We spend a lot of time in this class practicing for the state test.” (ρ=0.195), “I have learned a lot this year about the state test.” (ρ=0.143), “Getting ready for the state test takes a lot of time in our class.” ( ρ=0.103)

I don’t know whether something got lost in the translation between the researchers and Gates education chief, Vicki Phillips, or between her and Sam Dillon at the New York Times, but the article contains a false claim that needs to be corrected before it is used to push changes in education policy and practice.

UPDATE —

The LA Times coverage of the report contains a similar misinterpretation: “But the study found that teachers whose students said they “taught to the test” were, on average, lower performers on value-added measures than their peers, not higher.”

Try this thought experiment with another observed practice to illustrate my point about how the results are being mis-reported…  The correlation between student observations that “My teacher seems to know if something is bothering me” and value added was .153, which was less than the .195 correlation for “We spend a lot of time in this class practicing for [the state test].”  According to the interpretation in the NYT and LA Times, it would be correct to say “teachers who care about student problems tend to have lower value-added learning gains than those who spend a lot of time on test prep.”

Of course, that’s not true.  Teachers caring about what is bothering students is positively associated with value added just as test prep is.  It is just that teachers caring is a little less strongly related than test prep.  Caring does not have a negative effect just because the correlation is lower than other observed behaviors.

(edited for typos)


Finland Sucks

December 7, 2010

Actually, I don’t really think so.  But if I were Diane Ravitch and looked at the trend in PISA for Finland as she looked at the trend in NAEP for New York City, I would see that Finland has declined in reading, math, and science.  And then I would (wrongly) conclude that Finland sucks and is doing things all wrong.

Table 5.1 Finland’s mean scores on reading, mathematics and science scales in PISA (p. 118)

PISA 2000 PISA 2003 PISA 2006 PISA 2009
Mean score Mean score Mean score Mean score
Reading 546 543 547 536
Mathematics 544 548 541
Science 563 554

Or perhaps if I really wanted to be like Diane Ravitch I would switch from looking at trends to levels of achievement, like when she looks at Massachusetts.  In that case, I would still think Finland is great and doing everything right.

Or maybe I could be like Diane Ravitch and switch to a different test that produced results more to my liking, like when Diane stopped paying attention to NAEP for New York City when it showed significant gains and started focusing instead on problems in the state test measures.

That’s the problem with being a manipulative propagandist.  It’s so hard to keep your story straight from one deception to another.


Supermarket Fail

December 7, 2010

It’s not just schools.  All organizations struggle to operate efficiently.  The only difference is that some organizations experience consequences when they fail and others don’t.

(HT: LW)


Chris Christie Kicks Butt at Jebfest 2010

December 7, 2010

You really have to check this out.  Christie has an intensity and directness about the problems with teacher unions and the educational status quo that is wonderfully refreshing and unfortunately too rare among politicians.

(You have to click the link in the first sentence above to go to CSPAN and watch the video. I can’t seem to embed the video.   CSPAN and WordPress don’t seem to get along.)