methodology | Jay P. Greene's Blog

Eduwonkette Apologizes

July 8, 2008

I appreciate Eduwonkette’s apology posted on her blog and in a personal email to me. It is a danger inherent in the rapid-fire nature of blogging that people will write things more strongly and more sweeping than they might upon further reflection. I’ve already done this on a number of occasions in only a few months of blogging, so I am completely sympathetic and un-offended.

One could argue that these errors demonstrate why people shouldn’t write or read blogs. In fact some people have argued that ideas need a process of review and editing before they should be shown to the public. These people tend to be ink-stained employees of “dead-tree” industries or academia, but they have a point: there are costs to making information available to people faster and more easily.

Despite these costs the ranks of bloggers and web-readers have swelled. There are even greater benefits to making more information available to more people, much faster than the costs of doing so. People who read blogs and other material on the internet are generally aware of the greater potential for error, so they usually have a lower level of confidence in information obtained from these sources than from other sources with more elaborate review and editing processes. Some material from blogs eventually finds its way into print and more traditional outlets, and readers increase their confidence level as that information receives further review.

Of course, the same exact dynamics are at work in the research arena. Releasing research directly to the public and through the mass media and internet improves the speed and breadth of information available, but it also comes with greater potential for errors. Consumers of this information are generally aware of these trade-offs and assign higher levels of confidence to research as it receives more review, but they appreciate being able to receive more of it sooner with less review.

In short, I see no problem with research initially becoming public with little or no review. It would be especially odd for a blogger to see a problem with this speed/error trade-off without also objecting to the speed/error trade-offs that bloggers have made in displacing newspapers and magazines. If bloggers really think ideas need review and editing processes before they are shown to the public, they should retire their laptops and cede the field to traditional print outlets.

We have a caveat emptor market of ideas that generally works pretty well.

So it was disappointing that following Eduwonkette’s graceful apology, she attempted to draw new lines to justify her earlier negative judgment about our study released directly to the public. She no longer believes that the problem is in public dissemination of non-peer-reviewed research. She’s drawn a new line that non-peer-reviewed research is OK for public consumption if it contains all technical information, isn’t promoted by a “PR machine,” isn’t “trying to persuade anybody in particular of anything,” and is released by trustworthy institutions.

The last two criteria are especially bothersome because they involve an analysis of motives rather than an analysis of evidence. I defended Eduwonkette’s anonymity on the grounds that it doesn’t matter who she is, only whether what she writes is true. But if Eduwonkette believes that the credibility of the source is an important part of assessing the truth of a claim, then how can she continue to insist on her anonymity and still expect her readers to believe her. How do we know that she isn’t trying to persuade us of something and isn’t affiliated with an untrustworthy institution if we don’t know who she is? Eduwonkette can’t have it both ways. Either she reveals who she is or she remains consistent with the view that the source is not an important factor in assessing the truth of a claim.

No sooner does Eduwonkette establish her new criteria for the appropriate public dissemination of research than we discover that she has not stuck to those criteria herself. Kevin DeRosa asks her in the comments why she felt comfortable touting a non-peer-reviewed Fordham report on accountability testing. That report was released directly to the public without full technical information, was promoted by a PR machine, comes from an organization that is arguably trying to persuade people of something and whose trustworthiness at least some people question.

So, she articulates a new standard: releasing research directly to the public is OK if it is descriptive and straightforward. I haven’t combed through her blog’s archives, but I am willing to bet that she cites more than a dozen studies that fail to meet any of these standards. Her reasoning seems ad hoc to justify criticism of the release of a study whose findings she dislikes.

Diane Ravitch also chimes in with a comment on Eduwonkette’s post: “The study in this case was embargoed until the day it was released, like any news story. What typically happens is that the authors write a press release that contains findings, and journalists write about the press release. Not many journalists have the technical skill to probe behind the press release and to seek access to technical data. When research findings are released like news stories, it is impossible to find experts to react or offer ‘he other side,’ because other experts will not have seen the study and not have had an opportunity to review the data.”

Diane Ravitch is a board member of the Fordham Foundation, which releases numerous studies on an embargoed basis to reporters “like any news story.” Is it her position that this Fordham practice is mistaken and needs to stop?

3 Comments | methodology, research reports | Tagged: Best of Jay, Diane Ravitch, Eduwonkette, Jay Greene, Jay P. Greene, peer review | Permalink
Posted by Jay P. Greene

What Does the Red Pill Do If I Don’t Take It?

June 19, 2008

(Guest Post by Matthew Ladner)

The hidden highlight from the Evaluation of the DC Opportunity Scholarship Program: Impacts After Two Years report is buried in the Appendix, pp. E-1 to E-2:

Applying IV analytic methods to the experimental data from the evaluation, we find a statistically significant relationship between enrollment in a private school in year 2 and the following outcomes for groups of students and parents (table E-1):

• Reading achievement for students who applied from non-SINI schools; that is, among students from non-SINI schools, those who were enrolled in private school in year 2 scored 10.73 scale score points higher (ES = .30)^2 than those who were not in private school in year 2.

• Reading achievement for students who applied with relatively higher academic performance; the difference between those who were and were not attending private schools in year 2 was 8.36 scale score points (ES = .24).

• Parents’ perceptions of danger at their child’s school, with those whose children were enrolled in private schools in year 2 reporting 1.53 fewer areas of concern (ES = -.45) than those with children in the public schools.

• Parental satisfaction with schooling, such that, for example, parents are 20 percentage points more likely to give their child’s school a grade of A or B if the child was in a private school in year 2.

• Satisfaction with school for students who applied to the OSP from a SINI school; for example, they were 23 percentage points more likely to give their current school a grade of A or B if it was a private school.

I’m trying to figure out why the impact of actually using the voucher program isn’t actually the focus of this study, and in fact is presented in an appendix. Instead all the “mixed” results are studying the impact of having been offered a scholarship whether the student actually used it or not.

I’m going to walk way out on a limb here and predict that the impact on test scores of being offered but not using a voucher will be indistinguishable from zero. If this were a medical study, we would have a group of patients in a control and experimental group offered a drug, some of them choose not to take it, but we ignore that fact and measure the impact of the drug based on the results of both those who took it and those who didn’t. Holding the pill bottle can’t be presumed to have the same impact as taking the pills.

We’ve all been told that exercise is good for our health. Should we judge the effectiveness of exercise on health outcomes by what happens to those who actually exercise, or by the results for everyone that has been told that it is good for you?

This shortcoming has been corrected in the Appendix, but that is getting very little attention. On page 24 the evaluation reads:

Children in the treatment group who never used the OSP scholarship offered to them, or who did not use the scholarship consistently, could have remained in or transferred to a public charter school or traditional DC public school, or enrolled in a non-OSP-participating private school.

So in the report’s main discussion, the kids actually attending private schools have to make gains big enough to make up for the fact that many “treatment” kids are actually back in DCPS. As it turns out, several subsets of students do make such gains, but that’s not the point. The point is we ought to be primarily concerned with whether actual utilization of the program improves education outcomes and with systemic effects of the program. We should indeed study who actually uses this program, and who chooses not to and the reasons why (very important information), but this sort of analysis seems to belong in the appendix rather than the other way around.

Receiving an offer of a school voucher doesn’t constitute much of an education intervention, and it seems painfully obvious that the discussion around this report is conflating the impact of voucher offers with that of voucher use. The impact of voucher use is clear and positive.

12 Comments | methodology, research reports, vouchers | Tagged: vouchers, Washington DC | Permalink
Posted by matthewladner

The SAT and College Grades

June 18, 2008

(Guest post by Larry Bernstein)

Yesterday, the College Board released a study of the predicative power of the SAT to estimate a student’s college freshman year grade point average. A Bloomberg article condemned the results because of the relative ineffectiveness of the new SAT to predict college grades. The predictive power of the SAT is trivially improved by the addition of the new essay exam which adds test time and is costly to grade.

I think this should come as no surprise, and it shows the general limitations of using standardized tests to predict college grades. One of the key points made in the study is that high school grades are a better predictor versus the SAT. High school grades need to be included with the SAT to best estimate GPA.

In my 1985 Wharton undergraduate statistics class, each student was required to create a regression research project. By chance, I chose to research predicting my classmate’s college GPA. I used 20 variables, including the SAT score, and I found only 5 variables with statistical significance: SAT score, number of hours studied, Jewish or Gentile, Wharton or other school such as the college of arts and sciences, and raised in the Northeast or elsewhere.

Similar to the national studies, in my survey of 100 fraternity brothers the SAT score did a mediocre job of predicting college GPA as a single variable. The key variable in my study was the number of hours studied. You would be surprised by the variance in Ivy Leaguers’ study habits. My survey asked students to estimate the number of hours as 1-10, 10-20, 20-30, or 30-40. My favorite response was: “Is this per semester?” I assumed the student would realize it was per week! Work habits and effort played a critical role in estimating college GPA. Obviously, the college placement office will have difficulty estimating this variable, though difficulty of course load and number of AP classes might help.

The rest of the variables seem obvious. It is much more difficult to get into Wharton than the other programs at Penn. So it is no surprise that Wharton students were running circles around the non-Wharton students, even adjusting for SAT scores and hours studied. In addition, it is much more difficult to get into Penn from the NE than from other areas of the country.

Very few of the Jews were jocks. Needless to say my college fraternity had plenty of sample problems.

2 Comments | higher education, methodology, research reports | Tagged: Bloomberg, College Board, GPA, Larry Bernstein, SAT | Permalink
Posted by Greg Forster

Strawman — er, I mean — Strawperson

May 22, 2008

The American Association of University Women released a report this week attempting to debunk concerns that have been raised about educational outcomes for boys. The AAUW report received significant press coverage, including articles in the WSJ and NYT.

But the AAUW report simply debunks a strawman — er, I mean — strawperson. The report defines its opponents in this way: “many people remain uncomfortable with the educational and professional advances of girls and women, especially when they threaten to outdistance their male peers.” Really? What experts or policymakers have articulated that view? The report never identifies or quotes its opponents, so we left with only the Scarecrow as our imaginary adversary.

Once this stawperson is built, it’s easy for the report to knock it down. The authors argue that there’s no “boy crisis” because boys have not declined or have made gradual gains in educational outcomes over the last few decades. And the gap between outcomes for girls and boys has not grown significantly larger.

This is all true, as far as it goes, but it does not address the actual claims that are made about problems with the education of boys. For example, Christina Hoff Sommers’ The War Against Boys claims: “It’s a bad time to be a boy in America… Girls are outperforming boys academically, and girls’ self-esteem is no different from boys’. Boys lag behind girls in reading and writing ability, and they are less likely to go to college.” Sommers doesn’t say that boys are getting worse or that the gap with girls is growing. She only says that boys are under-performing and deserve greater attention.

Nothing in the new AAUW report refutes those claims. In fact, the evidence in the report clearly supports Sommers’ thesis. If we look at 17-year-olds, who are the end-product of our K-12 system, we find that boys trail girls by 14 points on the most recent administration of the Long-Term NAEP in 2004 (See Figure 1 in AAUW). In 1971 boys trailed by 12 points. And in 2004 boys were 1 point lower than they were in 1971.

In math the historic advantage that boys have had is disappearing. In 1978 17-year-old boys led girls by 7 points on the math NAEP, while in 2004 they led by 3 points. (See Figure 2 in AAUW) Both boys and girls made small improvements since 1978, but none since 1973.

Boys also clearly lag girls in high school graduation rates. According to a study I did with Marcus Winters, 65% of the boys in the class of 2003 graduated with a regular diploma versus 72% of girls. Boys also lag girls in the rate at which they attend and graduate from college. While boys exceed girls in going to prison, suicide, and violent deaths.

It takes extraordinary effort by the AAUW authors to spin all of this as refuting a boy crisis. They focus on how the gap is not always growing larger and that boys are sometimes making gains along with girls. They also try to divert attention by saying that the gaps by race/ethnicity and income are more severe. But no amount of spinning can obscure the basic fact that boys are doing quite poorly in our educational system and deserve some extra attention.

To check out what other bloggers are saying on this report see Joanne Jacobs, and just this morning, Carrie Lukas in National Review Online.

3 Comments | methodology, research reports | Tagged: AAUW, American Association of University Women, Christina Hoff Sommers, Jay Greene, Jay P. Greene, war on boys | Permalink
Posted by Jay P. Greene

Ask Reid Lyon

May 13, 2008

(Guest Post by Reid Lyon)

“How did scientific research become influential in guiding federal education ‎policy given the field’s historical reliance on ideology, untested ‎assumptions, anecdotes, and superstition to inform both policy and practice?‎”

It has not been an easy journey. In fact it’s like getting a root canal every other week. What makes it tough is that you are always bumping up against the anti-scientific thinking that has had a misguided influence on the perceived value of research throughout the history of education and increasingly in the past two decades. Many researchers have tried to infuse scientific research into education policy over the years but it never gained political traction. Jeanne Chall gave her career to this cause, but the political will was never there. Many at the policy level rarely listened to her, much less took her advice. Chall would tell me frequently that by not basing reading instruction on research we do grave harm to the students education seeks to serve. I repeated her wisdom every time I testified before congressional committees. I also repeated myself time and again that education like other sectors that serve the public, must have reliable information about what works, why it works, and how it works. The alternative was to basically throw mud against a wall and see what sticks – a practice in place for a very long time. I would argue that scientific research and dissemination of reliable information to the educational community is non-negotiable given that all sectors of a productive society depend on an educated workforce. To be sure, many in the education community sure got medieval on me for holding to this position.

But logic, congressional testimony, research syntheses, or policy papers were not going to change the culture in education which had reinforced an “everything and anything goes” spirit for the past century. Infusing research into policy and practice was going to take strong support from a senior member or members of congress who could argue the need in a compelling way. Bill Goodling, past chair of the House Education and the Workforce Committee, did just that and in 1996 began to support the concept of “research-based education”. Goodling was a past educator and was floored when he began to delve into the fact that millions of kids could not read. His staff learned that the NIH had been studying reading development and reading difficulties since 1965 so they called me in early 1996 to brief the chairman on what we knew about reading from a research standpoint. At that time, I directed the NICHD Reading Research program at the NIH. During the briefing, he was literally taken aback to learn that NICHD/NIH had studied over 40,000 good and not so good readers, many of them over time, and we had a good idea of what it took to learn to read and what to do about reading difficulties. He could not understand why there was such a massive gap between what research had demonstrated vis-à-vis reading development and instruction and what was actually taught to teachers and implemented in schools.

1996 turned out to be a pretty important year in bringing the massive reading failure issue before the public and mobilizing some scientific efforts. It was also an important year for laying the foundation for research-based education policy as it is reflected in federal legislation today. President Clinton called attention to the tragedy of reading failure in his State of the Union address that year. His attention to the issue clearly put the problem on congressional radar screens. In the same year, the Department of Education and the NICHD supported the convening of a National Research Council (NRC) panel to synthesize and summarize research on the prevention of reading difficulties. Interestingly, at the same time, state leaders were becoming interested in the “research to policy and practice issue”. Interestingly, in 1996, then Texas governor George Bush asked me and members of several strong research teams in Texas and around the country to brief him on how scientific research in reading could help reduce reading failure in Texas. In one of the meetings he asked a pretty prescient question about how scientific research could help kids whose first language was Spanish to learn to listen, speak, read, and write in English. This question actually gave birth to the NICHD national “Spanish to English” study carried out in multiple sites across the country.

But during that year it was Goodling and his staff who went to work on the specifics and the need to educate other congressional members not only about the drastic need to address the reading issue, but to emphasize the role of scientific research in solving educational problems. He and his staff devoted substantial time in 1996 reviewing the NICHD reading research. In early 1997, he and his counterparts in the senate held hearings on literacy development and the role of scientific research in developing and implementing effective instructional practices. It came as a surprise to me that in my testimony that year before both House and Senate committees, members asked about research on reading and how it could help guide policy and practice. Their interest in using scientific research to guide practice and policies would later extend to other education programs beyond reading as I was asked to cover the issue in testimony on Title I, Head Start, and IDEA re-authorizations which took place over the next 9 years. And Goodling was the first legislator to formally infuse scientific research in reading into a federal education program. In 1998, He sponsored the Reading Excellence Act, which for the first time required that federal funding be contingent on states and local districts using scientifically based programs.

To further underscore the interest and commitment that congress had in using research to guide federal education policy, Senator Thad Cochran and Representative Anne Northup asked the NICHD in 1998 to convene a National Reading Panel (NRP) to build on the findings of the 1996 NRC panel on preventing reading difficulties in young children. The NRP was tasked to undertake a review of research on reading instruction that would identify the types of programs and principles that were most effective in improving reading proficiency. While the NRC and NRP reports were initiated and published during the Clinton administration, the Bush administration used the findings not only to craft Reading First but to serve as an example of the overarching principle that educational policy and instructional practices should be predicated on research. From this principal evolved the established of the Institute of Educational Sciences, the NRC Report on “Scientific Research in Education”, the Partnership for Reading which served as a resources to disseminate scientific research findings, and the What Works Clearing House. Private groups such as the Council for Excellence in Government, which established the Coalition for Evidence Based Policy, began to contribute to this effort as well.

If you take all of this together, the recent influx of educational science into policy came about through a concerted effort to solve a national reading problem. Using research to guide educational policy and program development has now been extended far beyond reading. A number of actions such as congressional hearings, funding of research reports on science in education, requiring federal funds be contingent on the use of research-based programs and approaches, passing legislation such as the Education Sciences Reform Act of 2002, and building a federal infrastructure which, by its inclusion of the Institute of Educational Sciences and the What Works Clearing House, explicitly sent the message that research-based policies and programs were the rule, not the exception. It is the case that much of the integration of actions and events was strategic and designed to provide a role for scientific research in education. A research to policy and practice culture had to be strengthened through federal legislation and in the scientific infrastructure within the Department of Education.

Time will tell if the gains made in using research to guide education policy will last. History tells us that education is impatient and subject to fads, superstition, anecdotes, and the next magic bullet. To be sure, education is more political than scientific and subject to all the negatives that the political world brings but few of the positives. And many do not understand that by its cannons, evidence is apolitical. There is a tendency to forget that research is not only essential for informing policy but critical for improving policies and programs once in place. But trial and error has become a habit in education and it will take real courage and persistence to overcome that. In a sense, the world of education policy is like a slinky–it can expand to take new steps, but it ultimately recoils back to its original configuration. All this said, I am optimistic.

12 Comments | instructional reform, methodology, politics | Tagged: instructional reform, Reading First, Reid Lyon | Permalink
Posted by Jay P. Greene

Vouchers: Evidence and Ideology

May 8, 2008

(Guest post by Greg Forster)

Lately, Robert Enlow and I at the Friedman Foundation for Educational Choice have had to spend a lot of time responding to the erroneous claims Sol Stern has been making about school choice. I honestly hate to be going up against Sol Stern right at the moment when he’s doing important work in other areas. America owes Stern a debt for doing the basic journalistic work on Bill Ayers that most journalists covering the presidential race didn’t seem interested in doing.

But what can we do? We didn’t choose this fight. If Stern is going to make a bunch of false claims about school choice, it’s our responsibility to make sure people have access to the facts and the evidence that show he’s wrong.

That’s why Enlow and I have focused primarily on using data and evidence to demonstrate that Stern’s claims are directly contrary to the known facts. It’s been interesting to see how Stern and his defenders are responding.

I’ve been saddened at how little effort Stern and his many defenders are devoting to seriously addressing the evidence we present. For example, all the studies of the effects of vouchers on public schools that were conducted outside the city of Milwaukee have been completely ignored both by Stern and by every one of his defenders I’ve seen so far. Does evidence outside Milwaukee not count for some reason? Since most of the studies on this subject have been outside Milwaukee, this arbitrary focus on Milwaukee is hard to swallow.

And what about the studies in Milwaukee? All of them had positive findings: vouchers improve public schools. Unfortunately, Stern and his critics fail to engage with these studies seriously.

Stern had argued in his original article that school choice doesn’t improve public schools, on grounds that the aggregate performance of schools in Milwaukee is still bad. His critics pointed out that a large body of high quality empirical research found that vouchers have a positive effect on public schools, both in Milwaukee and elsewhere. If Milwaukee schools are still bad, that doesn’t prove vouchers aren’t helping; and since a large body of high quality empirical research says they do help, the obvious conclusion to reach – if we are going to be guided by the data – is that other factors are dragging down Milwaukee school performance at the same time vouchers are pulling it upward.

If an asthma patient starts using medicine, and at the same time takes up smoking, his overall health may not improve. But that doesn’t mean the medicine is no good. I also think that there may be a “neighborhood effect” in Milwaukee, since eligibility for the program isn’t spread evenly over the whole city.

There’s new research forthcoming in Milwaukee that I hope will shed more light on the particular reasons the city’s aggregate performance hasn’t improved while vouchers have exerted a positive influence on it. The important point is that all the science on this subject (with one exception, in D.C., which I’ve been careful to take note of when discussing the evidence) finds in favor of vouchers.

In Stern’s follow-up defense of his original article, his “response,” if you can call it that, is to repeat his original point – that the aggregate performance of schools in Milwaukee citywide are still generally bad.

He disguises his failure to respond to his critics’ argument by making a big deal out of dates. He says that all the studies in Milwaukee are at least six years old (which is actually not very old by the standards of education research), and then provides some more recent data on the citywide aggregate performance of Milwaukee schools. But this obviously has nothing to do with the question; Stern’s critics agree that the aggregate data show Milwaukee schools are still bad. The question is whether vouchers exert a positive or negative effect. Aggregate data are irrelevant; only causal studies can address the question.

Of course it’s easy to produce more up-to-date data if you’re not going to use scientific methods to distinguish the influence of different factors and ensure the accuracy of your analysis. If you don’t care about all that science stuff, there’s no need to wait for studies to be conducted; last year’s raw data will do fine.

Weak as this is, at least it talks about the evidence. The response to our use of facts and evidence has overwhelmingly been to accuse school choice supporters of ideological closed-mindedness. Although we are appealing to facts and evidence, we are accused of being unwilling to confront the facts and evidence – accused by people who themselves do not engage with the facts and evidence to which we appeal.

Stern, for example, complains at length that “school choice had become a secular faith, requiring enforced discipline” and “unity through an enforced code of silence.” Apparently when we demonstrate that his assertions are factually false, we are enforcing silence upon him. (We’ve been so successful in silencing Stern that he is now a darling of the New York Times. If he thinks this is silence, he should get his hearing checked.)

Similarly, when Stern’s claims received uncritical coverage from Daniel Casse in the Weekly Standard, Enlow and Neal McCluskey wrote in to correct the record. Casse responded by claiming, erroneously, that Stern had already addressed their arguments in his rebuttal.

Casse also repeated, in an abbreviated form, Stern’s non-response on the subject of the empirical studies in Milwaukee – and in so doing he changed it from a non-response to an error. He erroneously claims that Stern responded to our studies by citing the “most recent studies.” But Stern cites no studies; he just cites raw data. It’s not a study until you conduct a statistical analysis to distinguish the influence of particular factors (like vouchers) from the raw aggregate results – kind of like the analyses conducted in the studies that we cite and that Stern and Casse dismiss without serious discussion.

Casse then praised Stern’s article because “it dealt with the facts on the ground” and accused school choice supporters of “reciting the school choice catechism.”

Greg Anrig, in this Washington Monthly article, actually manages to broach the subject of the scientific quality of one of the Milwaukee studies. Unfortunately, he doesn’t cite any of the other research, in Milwaukee or elsewhere, examining the effect of vouchers on public schools. So if you read his article without knowing the facts, you’ll think that one Milwaukee study is the only study that ever found that vouchers improve public schools, when in fact there’s a large body of consistently positive research on the question.

Moreover, Anrig’s analysis of the one Milwaukee study he does cite is superficial. He points out that the results in that study may be attributable to the worst students leaving the public schools. Leave aside that this is unlikely to be the case, much less that it would account for the entire positive effect the study found. The more important point is that there have been numerous other studies of this question that use methods that allow researchers to examine whether this is driving the results. Guess what they find.

Though he ignores all but one of the studies cited by school choice supporters, shuffling all the rest offstage lest his audience become aware of the large body of research with positive findings on vouchers, Anrig cites other studies that he depicts as refuting the case for vouchers. Like Stern’s citation of the raw data in Milwaukee, these other studies in fact are methodologically unable to examine the only question that counts – what was the specific impact of vouchers, as distinct from the raw aggregate results? (I’m currently putting together a full-length response to Anrig’s article that will go over the specifics on these studies, but if you follow education research you already know about them – the notoriously tarnished HLM study of NAEP scores, the even more notoriously bogus WPRI fiasco, etc.)

But Anrig, like his predecessors, is primarily interested not in the quality of the evidence but in the motives of school choice supporters. He spends most of his time tracing the sinister influence of the Bradley Foundation and painting voucher supporters as right-wing ideologues.

And these are the more respectable versions of the argument. In the comment sections here on Jay P. Greene’s Blog, Pajamas Media, and Joanne Jacobs’s site, much the same argument is put in a cruder form: you can’t trust studies that find school choice works, because after all, they’re conducted by researchers who think that school choice works.

(Some of these commenters also seem to be confused about the provenance and data sources of these studies. I linked to copies of the studies stored in the Friedman Foundation’s research database, but that doesn’t make them Friedman Foundation studies. As I stated, they were conducted at Harvard, Princeton, etc. And at one point I linked to an ELS study I did last year that also contained an extensive review of the existing research on school choice, but that doesn’t mean all the previous studies on school choice were ELS studies.)

What is one to make of all this? The more facts and evidence we provide, the more we’re accused of ignoring the facts and evidence – by people who themselves fail to address the facts and evidence we provide.

I’m tempted to say that there’s a word for that sort of behavior. And there may be some merit in that explanation, though of course I have no way of knowing. But I also think there’s something else going on as well.

One prominent blogger put it succinctly to me over e-mail. The gist of his challenge was something like: “Why don’t you just admit that all this evidence and data is just for show, and you really support school choice for ideological reasons?”

I think this expresses an idea that many people have – that there is “evidence” over here and then there is “ideology” over there, and the two exist in hermetically sealed containers and can never have any contact with one another. (Perhaps this tendency is part of the long-term damage wrought by Max Weber’s misuse of the fact/value distinction, but that’s a question for another time.)

On this view, if you know that somebody has a strong ideology, you have him “pegged” and can dismiss any evidence he brings in support of his position as a mere epiphenomenon. The evidence is a distraction from your real task, which is to identify and reveal the pernicious influence of his ideology on his thinking. Hence the widespread assumption that when a school choice supporter brings facts and evidence, there is no need to trouble yourself addressing all that stuff. Why bother? The point is that he’s an ideologue; the facts are irrelevant.

But, as I explained to the blogger who issued that challenge, evidence and ideology are not hermetically sealed. Ideology includes policy preferences, but those policy preferences are always grounded in a set of expectations about the way the world works. In fact, I would say that an “ideology” is better defined as a set of expectations about how the world works than as a set of policy preferences. (That would help explain, for example, why we still speak of differences between “liberal” and “conservative” viewpoints even on issues like immigration where there are a lot of liberals and conservatives on both sides.) And our expectations about how the world works are subject to verification or falsification by evidence.

So, for example, I hold an ideology that says (broadly speaking) that freedom makes social institutions work better. That’s one of the more important reasons I support school choice – because I want schools (all schools, public and private) to get better, and I have an expectation that when educational freedom is increased, schools will improve. My ideology is subject to empirical verification. If school choice programs do in fact make public schools better – as the empirical studies consistently show they do – then that is evidence that supports my ideology.

Even the one study that has ever shown that vouchers didn’t improve public schools, the one in D.C., also confirms my ideology. The D.C. program gives cash bribes to the public school system to compensate for lost students, thus undermining the competitive incentives that would otherwise improve public schools – so the absence of a positive voucher impact is just what my ideology would predict.

Other evidence may also be relevant to the truth or falsehood of my ideology, of course. The point is that evidence is relevant, and truth or falsehood is the issue that matters.

Now, as I’ve already sort of obliquely indicated, my view that freedom makes things work better is not the only reason I support school choice. But it is one of the more important reasons. So, if you somehow proved to me that freedom doesn’t make social institutions work better, I wouldn’t immediately disavow school choice, since there are other reasons besides that to support it. However, I would have significantly less reason to support it than I did before.

If we really think that evidence has nothing to do with ideology, I don’t see how we avoid the conclusion that people’s beliefs have nothing to do with truth or falsehood – ultimately, that all human thought is irrational. Bottom line, you aren’t entitled to ignore your opponent’s evidence, or dismiss it as tainted because it is cited by your opponent.

UPDATE: See this list of complete lists of all the empirical research on vouchers.

Edited for typos

14 Comments | competitive effects, methodology, vouchers | Tagged: Best of Greg, City Journal, competitive effects, Daniel Casse, Greg Anrig, Greg Forster, methodology, Neil McCluskey, Robert Enlow, Sol Stern, vouchers, Washington Monthly, Weekly Standard | Permalink
Posted by Greg Forster

It’s Only a Flesh Wound

April 29, 2008

(Guest post by Ryan Marsh)

Many reform strategies are predicated on the belief that teachers have the largest impact on student achievement and that we can measures the teacher’s contribution with reasonable accuracy. Policies, such as performance pay or other efforts to recruit and retain effective teachers require reasonably accurate identification of which teachers are the most effective and which are the least at adding to their students’ achievement.

Value-added models, or VAMs, are the statistical models commonly used for this purpose. VAMs attempt to estimate teacher effectiveness by controlling for prior achievement and other student characteristics.

Two recent working papers have started a very important debate about the use of VAMs, a debate which will greatly influence future education policy and research. Economist Jesse Rothstein has a working paper in which he performed a critical analysis of these VAMs and their ability to estimate teacher effectiveness. His analysis focuses on the question of whether students are randomly assigned to teachers. If they are not, then the results of a VAMs should not necessarily be interpreted as causal estimates of teacher effectiveness. That is, if some teachers are non-randomly assigned students who will learn at a faster rate than others, then our estimates of who is an effective teacher could be biased.

Without getting too technical, Rothstein checks to see if a future teacher can predict past or present scores. If the teacher can predict growth in achievement for students before he or she becomes their teacher, then we have evidence of non-random assignment of students to teachers. After all, teachers could not have caused things that happened in the past.

But even if we have bias in VAMs from non-random assignment of students to teachers, the question is how seriously distorted are our assessments of who is an effective teacher. Many measures have biases and imperfections, but we still rely on them because the distortions are relatively minor. Rothstein recognizes this when he suggests on p. 32 a way of assessing the magnitude of the bias:

“An obvious first step is to compare non-experimental estimates of individual teachers’ effects in random assignment experiments with those based on pre- or post- experimental data (as in Cantrell, Fullerton, et. al 2007).”

The working paper he cites—by Steven Cantrell, Jon Fullerton, Thomas J. Kane, and Douglas O. Staiger—uses data from an experimental analysis of National Board for Professional Teaching Standards (NBPTS) certification. In the paper, the authors use a random assignment process where NBPTS applicant teachers are paired with non-applicant comparison teachers in their school and principals set up two classrooms which they would be willing to assign to the NBPTS teacher. Classes are randomly chosen for each teacher and compared with the class not chosen. The paper also uses VAMs to assess teacher effectiveness before the experiment was run. The prior effectiveness was used to predict how well a teacher’s students during the experiment performed above students in the comparison classrooms. This allows the researchers to test how well the VAM estimates compare with a random assignment experiment.

That is, teacher effectiveness was measured using VAMs before students were randomly assigned to teachers and then teacher effectiveness was measured after students were randomly assigned, when no bias would be present. The two correlate well, suggesting little distortion from the non-random assignment. As the authors conclude, the VAM estimates have “considerable predictive power in predicting student achievement during the experiment.”

In short, Rothstein raises a potentially lethal concern for policies based on value-added models, but another paper by Cantrell, et al suggests that the concerns may be little more than a flesh wound.

1 Comment | methodology, research reports | Tagged: methodology | Permalink
Posted by Jay P. Greene

Surprise! What Researchers Don’t Know about Florida’s Vouchers

April 21, 2008

(Guest post by Greg Forster)

Florida’s A+ program, with its famous voucher component, has been studied to death. Everybody finds that the A+ program has produced major improvements in failing public schools, and among those who have tried to separate the effect of the vouchers from other possible impacts of the program, everybody finds that the vouchers have a positive impact. At this point our understanding of the impact of A+ vouchers ought to be pretty well-formed.

But guess what? None of the big empirical studies on the A+ program has looked at the program’s impact after 2002-03. That was the year in which large numbers of students became eligible for vouchers for the first time, so it’s natural that a lot of research would be done on the impact of the program in that year. Still, you would think somebody out there would be interested in finding out, say, whether the program continued to produce gains in subsequent years. In particular, you’d think people would be interested in finding out whether the program produced gains in 2006-07, the first school year after the Florida Supreme Court struck down the voucher program in a decision that quickly became notorious for its numerous false assumptions, internal inconsistencies, factually inaccurate assertions and logical fallacies.

Yet as far as I can tell, nobody has done any research on the impact of the A+ program after 2002-03. Oh, there’s a study that tracked the schools that were voucher-eligible in 2002-03 to see whether the gains made in those schools were sustained over time. But that gives us no information about whether the A+ program continued to produce improvements in other schools that were designated as failing in later years. For some reason, nobody seems to have looked at the crucial question of how vouchers impacted Florida public schools after 2002-03.

[format=shameless self-promotion]

That is, until now! I recently conducted a study that examines the impact of Florida’s A+ program separately in every school year from 2001-02 through 2006-07. I found that the program produced moderate gains in failing Florida public schools in 2001-02, before large numbers of students were eligible for vouchers; big gains in 2002-03, when large numbers of students first became eligible for vouchers; significantly smaller but still healthy gains from 2003-04 through 2005-06, when artificial obstacles to participation blocked many parents from using the vouchers; and only moderate gains (smaller even than the ones in 2001-02) after the vouchers were removed in 2006-07.

[end format=shameless self-promotion]

It seems to me that this is even stronger evidence than was provided by previous studies that the public school gains from the A+ program were largely driven by the healthy competitive incentives provided by vouchers. The A+ program did not undergo significant changes from year to year between 2001-02 and 2006-07 that would explain the dramatic swings in the size of the effect – except for the vouchers. In each year, the positive effects of the A+ program track the status of vouchers in the program. If the improvements in failing public schools are not primarily from vouchers, what’s the alternative explanation for these results?

Obviously the most newsworthy finding is that the A+ program is producing much smaller gains now that the vouchers are gone. But we should also look more closely at the finding that the program produced smaller (though still quite substantial) gains in 2003-04 through 2005-06 than it did in 2002-03.

As I have indicated, I think the most plausible explanation is the reduced participation rates for vouchers during those years, attributable to the many unnecessary obstacles that were placed in the path of parents wishing to use the vouchers. (These obstacles are detailed in the study; I won’t summarize them here so that your curiosity will drive you to go read the study.) While the mere presence of a voucher program might be expected to produce at least some gains – except where voucher competition is undermined by perverse incentives arising from bribery built into the program, as in the D.C. voucher – it appears that public schools may be more responsive to programs with higher participation levels.

There’s a lot that could be said about this, but the thing that jumps to my mind is this: if participation rates do drive greater improvements in public schools, we can reasonably expect that once we have universal vouchers, the public school gains will be dramatically larger than anything we’re getting from the restricted voucher programs we have now.

One more question that deserves to be raised: how come nobody else bothered to look at the impact of the A+ program after 2002-03 until now? We should have known a long time ago that the huge improvements we saw in that year got smaller in subsequent years.

It might, for example, have caused Rajashri Chakrabarti to modify her conclusion in this study that failing-schools vouchers can be expected to produce bigger improvements in public schools than broader vouchers. In this context it is relevant to point out that many of the obstacles that blocked Florida parents from using the vouchers arose from the failing-schools design of the program. Chakrabarti does great work, but the failing-schools model introduces a lot of problems that will generally keep participation levels low even when the program isn’t being actively sabotaged by the state department of education. If participation levels do affect the magnitude of the public school benefit from vouchers, then the failing-schools model isn’t so promising after all.

So why didn’t we know this? I don’t know, but I’ll offer a plausible (and conveniently non-falsifiable) theory. The latest statistical fad is regression discontinuity, and if you’re going to do regression discontinuity in Florida, 2002-03 is the year to do it. And everybody wants to do regression discontinuity these days. It’s cutting-edge; it’s the avant-garde. It’s like smearing a picture of the virgin Mary with elephant dung – except with math.

You see the problem? It’s like the old joke about the guy who drops his keys in one place but looks for them in another place because the light is better there. I think the stats profession is constantly in danger of neglecting good research on urgent questions simply because it doesn’t use the latest popular technique.

I don’t want to overstate the case. Obviously the studies that look at the impact of the A+ program in 2002-03 are producing real and very valuable knowledge, unlike the guy looking for his keys under the street lamp (to say nothing of the elephant dung). But is that the only knowledge worth having?

(Edited to fix a typo and a link.)

3 Comments | competitive effects, methodology, research reports, vouchers | Tagged: competitive effects, Florida, Greg Forster, vouchers | Permalink
Posted by Greg Forster

Jay P. Greene's Blog

Eduwonkette Apologizes

What Does the Red Pill Do If I Don’t Take It?

The SAT and College Grades

Strawman — er, I mean — Strawperson

Ask Reid Lyon

Vouchers: Evidence and Ideology

It’s Only a Flesh Wound

Surprise! What Researchers Don’t Know about Florida’s Vouchers

Recent Posts

Archives

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Recent Posts

Archives

Meta