Nominations Solicited for the 2018 Al Copeland Humanitarian Award

October 5, 2018

Image result for al copeland

It is time once again for us to solicit nominations for the Al Copeland Humanitarian Award.  The criteria of the Al Copeland Humanitarian Award can be summarized by quoting our original blog post in which we sang the praises of Al Copeland and all that he did for humanity:

Al Copeland may not have done the most to benefit humanity, but he certainly did more than many people who receive such awards.  Chicago gave Bill Ayers their Citizen of the Year award in 1997.  And the Nobel Peace Prize has too often gone to a motley crew including unrepentant terrorist, Yassir Arafat, and fictional autobiography writer, Rigoberta Menchu.   Local humanitarian awards tend to go to hack politicians or community activists.  From all these award recipients you might think that a humanitarian was someone who stopped throwing bombs… or who you hoped would picket, tax, regulate, or imprison someone else.

Al Copeland never threatened to bomb, picket, tax, regulate, or imprison anyone.  By that standard alone he would be much more of a humanitarian.  But Al Copeland did even more — he gave us spicy chicken.

Last year’s winner of “The Al” was Stanislav Petrov, who literally saved the world from nuclear destruction by refusing to follow Soviet orders to retaliate against what he suspected (and was later confirmed) was a false warning of a US strike.  It’s not quite spicy chicken but it’s close. Petrov was selected from an excellent set of nominees, including Whittaker ChambersJustin Roiland and Dan Harmon, and Russ Roberts.

The previous year’s winner of “The Al” was Master Sergeant Roddie Edmonds, who prevailed over a very competitive field of nominees, including Tim and Karrie LeagueRemy Munasifi, and Yair Rosenberg.  Edmonds stood up against fascists at considerable risk to himself by declaring that he and all of his fellow prisoners of war were Jews to foil the Nazis’ effort to separate Jewish prisoners.  It is this type of courage in the face of illiberalism that we need more of in these times.

The 2015 winner of “The Al” was the internet humorist, Ken M.  Ken M did more to improve the human condition than just make us laugh by making idiotic comments on social media (although that would have been enough).  His humor reveals the ridiculousness of people trying to change the world by arguing with people on the internet.  Given how much time ed reformers waste on social media, especially Twitter, Ken M’s humor is a useful reminder that many of the people reading your posts are probably not much swifter or influential than the Ken M persona.  Ken M beat a set of strong nominees, including Malcolm McLeanGary Gygax, and John Lasseter.

The 2014 winner was Peter DeComo, the inventor of the Hemolung Respiratory Assist System.  To save a life DeComo had to trick border control officials to bring a model of his artificial lung machine into the US from Canada because the device had not yet been fully approved by the FDA.  DeComo won over a worthy field, including Marcus Persson, the inventor of Minecraft, Ira Goldman, the developer of the “Knee Defender,”  Thomas J. Barratt, the father of modern advertising, and Thibaut Scholasch and Sébastien Payen, wine-makers who improved irrigation methods.

The 2013 winner of “The Al” was Weird Al Yankovic.  Weird Al beat an impressive set of nominees, including Penn and TellerKickstarter, and Bill Knudsen.

The 2012 winner of “The Al” was George P. Mitchell, a pioneer in the use of fracking to obtain more, cheap and clean natural gas. Mitchell won over a group of other worthy nominees:  BanksyRansom E. OldsStan Honey, and Alfred Fielding and Marc Chavannes.

In 2011 “The Al” went to Earle Haas, the inventor of the modern tampon.  Thanks to Anna for nominating him and recognizing that advances in equal opportunity for women had as much or more to do with entrepreneurs than government mandates.  Haas beat his fellow nominees:  Charles Montesquieu, the political philosopher, David Einhorn, the short-seller, and Steve Wynn, the casino mogul.

The 2010  winner of  “The Al” was Wim Nottroth, the man who resisted Rotterdam police efforts to destroy a mural that read “Thou Shall Not Kill” following the murder of Theo van Gogh by an Islamic extremist.  He beat out  The Most Interesting Man in the World, the fictional spokesman for Dos Equis and model of masculine virtue, Stan Honey, the inventor of the yellow first down line in TV football broadcasts, Herbert Dow, the founder of Dow Chemical and subverter of a German chemicals cartel, and Marion Donovan and Victor Mills, the developers of the disposable diaper.

And the 2009 winner of “The Al” was  Debrilla M. Ratchford, who significantly improved the human condition by inventing the rollerbag.  She won over Steve Henson, who gave us ranch dressing,  Fasi Zaka, who ridiculed the Taliban,  Ralph Teetor, who invented cruise control, and Mary Quant, who popularized the miniskirt.

Nominations can be submitted by emailing a draft of a blog post advocating for your nominee.  If I like it, I will post it with your name attached.  Remember that the basic criteria is that we are looking for someone who significantly improved the human condition even if they made a profit in doing so.  Helping yourself does not nullify helping others.  And, like Al Copeland, nominees need not be perfect or widely recognized people.


New Field Trip Study

October 1, 2018

The National Art Education Association and the Association of Art Museum Directors just released a new study examining the effects of student field trips to art museums.  The study looked at outcomes for students who went on a single field trip to one of six different art museums around the country.  Instead of going to the museum, some students received an art museum intervention typically presented by museum staff in their classroom.  And a third group of students received neither the field trip or the classroom experience and served as the control group.

This new study is a helpful follow-up to the Crystal Bridges study that my colleagues Dan Bowen, Brian Kisida, and I conducted.  We found that students randomly assigned to a single field trip to the Crystal Bridges Museum of American Art outperformed those randomly assigned to a control group on measures of tolerance, empathy, content knowledge and critical thinking about art, as well as their desire to frequent museums in the future.  This new NAEA/AAMD study was designed to see if similar results could be produced by single field trips to other museums or if our findings were somehow particular to Crystal Bridges.

Importantly, the new NAEA/AAMD study does not randomly assign students across their two treatment and one control condition, unlike our previous Crystal Bridges study which did employ a random assignment research design.  This undermines our ability to draw causal conclusions with confidence since any differences we observe between treatment and control students may be caused by un-observed, pre-existing differences between the types of students who were non-randomly assigned to treatment and control conditions rather than caused by the treatment itself.

Despite this limitation, the NAEA/AAMD study is an impressive accomplishment and gives us information about a broader picture of museum field trip programs than we could get by examining just one museum.  And this new study yields some results that are consistent with our earlier experimental work.  In particular, it finds that students who go on field trips to the museum are significantly less likely to agree with the statement: “All people should understand a work of art in the same way.”  Students who received the classroom experience were also less likely to agree with this statement than the control group, but not by as much as those who actually went to the museum.  So there seems to be something about field trips to art museums that make students more willing to accept different perspectives.

This result is consistent with the tolerance and social perspective effects we observed in both the Crystal Bridges and the live theater studies we have conducted. And it is very similar to one of the items we used in those studies as well as our current Woodruff Arts Center study that asks students whether they agree or disagree with the statement “I think people can have different opinions about the same thing.”  While we are still collecting and analyzing results from Atlanta, I can report that we are finding students who receive three field trips in a single year — one each to the art museum, symphony, and theater — are significantly more likely to agree with this statement than students randomly assigned to a control group.  And amazingly, if students receive a second year of three more field trips, they agree with this statement even more.  It appears that this tolerance benefit of field trips to arts institutions endures and compounds with additional field trip experiences.

Another interesting finding from the new NAEA/AAMD study is that classroom experiences appear to be implemented with much less fidelity than field trip experiences.  It appears that museum educators have better ability to control conditions and do what they intended if the students are at museums rather than in classrooms.  This makes sense and may help explain why the classroom experiences, even when conducted by the same museum staff, have less of an impact.

Lastly, the new NAEA/AAMD study is inconsistent with our previous Crystal Bridges results in that it does not appear that students who go to the museum score significantly higher on a variety of measures that capture their interest in art and museums.  In the Crystal Bridges study we not only found that students expressed a stronger interest in visiting museums in the future, but we were able to track coded coupons that were given to all treatment and control students to observe that treatment students and their families were significantly more likely to attend the museum in the future.  On the other hand, in our live theater study, we only observed a weak effect of going on a field trip to see live theater on student interest in attending theater in the future.  And in the ongoing Woodruff experiment, field trips seem to produce positive consumption effects for some art forms right away but require additional exposure before becoming positive for others. It appears that whether field trips spur future interest in frequenting the arts is complicated and contingent on a variety of factors that we do not yet fully understand.

I applaud the NAEA and AAMD for conducting this research.  Only with repeated examination and attempted replication will we really gain confidence in our understanding of how cultural activity affects students.

(Update — This has been edited to describe the assignment of students to treatment and control conditions in the NAEA/AAMD study more accurately.)


Want More Art Ed? Decentralize School Control

September 14, 2018

I just came back from the National Convening of the Arts Education Partnership.  It was a fantastic gathering of arts advocates, researchers, and practitioners.  I was particularly struck by the comments during the opening session made by Eric Martin, who leads Music for All .  He noted that parents and communities tend to want more arts education than their schools often provide.  I suspect he’s right about that, but that raises a puzzle: if parents and communities want more art, why are their schools not providing what they want?

You might think the answer is a lack of funds, but that can’t really explain it.  The arts are not that expensive and if schools were more responsive to parental and community preferences, they would give greater priority to the arts in their budgets and schedules.  And then it dawned on me… schools are not more responsive to parent and community preferences regarding the arts because parents and communities no longer really control their schools.  Schools are increasingly answerable to distant bureaucrats in state or federal departments of education rather than to the parents and communities they serve.

This situation is a disaster for the arts.  Even if distant bureaucrats valued the arts as much as many parents and communities do, bureaucrats cannot give priority to the arts because that is not the basis by which the success or failure of their distant management is judged.  The only systematic, easily available information we have on schools is math and reading test scores.  Narrowing the focus of schools on math and reading test performance is inherent in the effort to manage those schools from a distance.  Parents and communities do not have to rely on math and reading test scores to judge school performance because they are close enough to gather a large amount of contextual information.  By contrast, the state superintendent has no access to this information about quality and is inevitably judged completely on the few bits of test score data we do have about all of the schools in their charge.

If this is correct, the most promising strategy for arts advocates to pursue to expand arts offerings in school would be to favor decentralization of control over schools to parents and communities.  If we want more art, let’s get out of the way of parents and communities that want more art.

The irony is that most of the people at this week’s Arts Education Partnership meeting are very focused on lobbying for policies at the state and federal level that they hope would advance the arts.  There was a lot of discussion of the importance of states adopting a set of national standards regarding arts education.  There were pleas for more funding and support from state departments of education.

All of these measures are sincere efforts by good people working hard on behalf of the arts.  But I suspect that the more arts advocates strengthen centralized control over schools, even if in the name of advancing the arts, the less likely we are to see priority given to the arts in education.  Centralized control requires evaluation by centrally collected metrics, which means an emphasis on math and reading test scores.  This is true no matter how many arts standards are adopted, how many state arts initiatives are adopted, or how many speeches in favor of the arts state officials give.

Arts advocates may want to shift their attention toward strengthening parent and community control over their own schools so those schools are more likely to deliver the arts education that folks really want.


Ed Reform Political Judgment Often Wrong

August 21, 2018

The Ed Reform Establishment tends to favor more highly regulated and targeted school choice programs.  When challenged on the merits of those preferences, they sometimes acknowledge that regulating and targeting choice may not produce better outcomes but they assert that such approaches have political advantages over less regulated and more universal programs.

The string of political failures, from Question 2 in Massachusetts to the inability of portfolio management to catch on (or even sustain itself in New Orleans), suggests that the Ed Reform Establishment seems to lack sensible political judgment.  But if we need more evidence that Ed Reformers are out of sync with political sentiment, just look at the findings of the new Ed Next Poll (co-authored by our new faculty member, Albert Cheng).

Of course, the way people answer poll questions does not directly translate into what is likely to be politically successful or not given how important political organization and strength of sentiment are in mobilizing opinion into policy.  But opinion polls give us some idea of what sentiment is out there for organizations to try to mobilize.  And political sentiment very clearly goes against the political calculations of the Ed Reform Establishment.

For example, Ed Reform experts tell us charters are more likely to be political winners than private school choice.  But if we look at the polling, vouchers are polling 10 points ahead of charters, with universal vouchers favored by 54% compared to charters favored by 44%. Tax credit private school choice programs are even more heavily supported, despite drawing little interest from the Ed Reform Establishment.

Ed Reform experts tell us that vouchers targeted toward the disadvantaged are more likely to be politically successful than universal programs.  But if we look at the polling, universal vouchers have a 11 percentage point advantage over targeted vouchers, which are only supported by 43% of the sample.

Other darlings of the Ed Reform Establishment also do not poll well.  The establishment bet heavily that general sympathy for standards could be channeled into supporting the specific proposal of Common Core standards.  But once the abstract idea of standards becomes the concrete proposal of Common Core, support drops from 61% to 45%, which is below the Mendoza line of 50% to overcome organized political resistance.

Heavily restricting local autonomy over disciplinary policy to ensure racial equity is also strongly favored by the Ed Reform Establishment, but it is deeply unpopular with the public, including teachers.  Only 27% of the public and 28% of teachers support “federal policies that prevent schools from expelling or suspending black and Hispanic students at higher rates than other students.”  Support for this is barely higher among Hispanic (35%) and African American (42%) respondents.

Lastly, the Ed Reform Establishment is very keen on “managed” enrollment systems that consider race and income in assigning students to schools.  The public does not share this enthusiasm.  Only 18% of the public, 27% of teachers, 24% of Hispanics, and 31% of African Americans think “public school districts [should] be allowed to take the racial background of students into account when assigning students to schools”  There is even less support for considering income when assigning students to schools.

Why does the Ed Reform Establishment so badly lack an accurate read on what has political support?  I suspect that Ed Reform has increasingly become a vanity project — a way to signal virtue to each other   — rather than a movement to make realistic and beneficial changes in policy.  This poor political judgment is exacerbated by a lack of consequences for ed reformers who regularly have poor political judgment and fail.  We seem to favor accountability for teachers but don’t seem to have much of it within the reform movement.

(Note: I’ve corrected the spelling of judgment.  Judgement is accepted in British English, but is not standard usage.)


Political Bias in Education Policy Research

August 13, 2018

Image result for political bias in academia

Education policy research is not really a scientific enterprise.  If it were, the field would be equally open to accepting research of equal rigor regardless of the findings.  That is simply not the case.  Research with preferred findings is more easily published in leading journals and embraced by scholars than research supporting less favored results.

There are countless examples of this, but here is one to illustrate the point…

The Journal of Policy Analysis and Management, a top journal in our field, has just published an analysis of vouchers in Indiana based on a matching research design.  Despite the fact that matching is normally intended to produce treatment and comparison groups that are nearly identical on observed characteristics, in this study the treatment group differed significantly from the control group in their pre-treatment measure of math performance.  Specifically, the treatment group has significantly higher scores on math tests.  And the one negative effect observed by the study was on math test scores, which was roughly comparable in magnitude to the amount by which the treatment group was higher on math scores pre-treatment.  So, basically the treatment group reverted to having about the same math scores as the control group once treatment began.  This negative effect, which was really the equalizing of the matched groups, was detected the first time students enrolled in a private school and did not grow in magnitude as students persisted in private school.  One might think that if private schools really harmed math scores, that harm might compound over time, but that did not occur.

These results certainly deserve publication and ought to inform the school choice policy debate despite the obvious limitations of the matching design that failed to make the groups comparable on the one outcome measure for which a negative outcome was observed.  While worthy of publication and discussion, it is questionable whether this article deserves publication in one of the field’s top journals and even more doubtful that it should be given as much credence as some folks in the field seem willing to give it.

Corey DeAngelis and Pat Wolf have a similar school choice study based on a matching research design with similar imperfections.  It examines whether students enrolled in the Milwaukee voucher program were more likely to be accused or convicted of a crime in later years than comparable students who had attended Milwaukee’s public schools.  Students in the treatment group were matched to public school students on a number of observable characteristics, including the neighborhood in which they lived.  Despite that matching effort,  the treatment and control groups were significantly different, with the treatment group having higher reading scores and more likely to be female.  Unlike the JPAM study, neither of these variables were the same as the outcome for which they observed effects.  Controlling for observable student and parental characteristics, students who had enrolled in Milwaukee’s voucher program were significantly less likely to be accused of a crime in later years.

The defects of Corey and Pat’s study are similar to those of the JPAM study.  It also uses a matching research design, and as I have said many times before, I don’t think we should have much confidence in matching designs to produce causal inferences.  And like the other study, Corey and Pat’s matching fails to produce treatment and control groups that are similar on all observed characteristics.  But unlike the other study, Corey and Pat’s research is not being published in JPAM.  In fact, JPAM desk rejected Corey and Pat’s study, deeming it unworthy even of being sent out for review.  A number of other journals did the same and they are now struggling to get it published in any journal.  I’m convinced that if only they had found that vouchers increased criminal behavior, their piece would already be in print in a respected journal.  But because they found a positive result for vouchers, the bar is higher and editors and reviewers can rightly note the defects in the study to justify rejection.

All research has limitations that might be invoked to support rejection or overlooked to support publication.  The double-standard used when judging voucher studies with favorable or unfavorable findings is a function of political bias and is an indication that our field is much less scientific than we would like to imagine.

It’s a shame that education policy researchers are largely uninterested in this problem of political bias.  Despite considerable energy devoted to promoting many dimensions of diversity within our field, there is virtually no effort to promote ideological diversity.  My department has a few researchers who would describe themselves as conservatives (while we also have had two faculty members who describe themselves as socialists), but I suspect most departments don’t have any self-described conservatives while others have no more than one or two.

It is interesting to note that despite having a department with six endowed chair holders, half of whom have Harvard doctorates, and all of whom have impressive research records, none of us have ever been asked to serve on the editorial boards of any journals (excluding the Journal of School Choice that my colleague, Bob Maranto, edits).  We’ve tried to play a part in governing our profession, but because we are branded (sometimes incorrectly) as conservatives we have been shunned.  The composition of editorial boards shapes who reviews submissions, which shapes what is published in those journals, which shapes what people in the field imagine the research consensus to be on various issues.

There are consequences to this political bias in our field.  First, the scientific quality of research is harmed by an increasing groupthink that fails to critically examine the key assumptions, methods, and implications of much of the work being produced.  Second, research in the field has diminished credibility and policy influence because others increasingly look at the field as more ideological and less scientific.  Some of the leading people in our field regularly take to Twitter to deride policymakers and the public for failing to heed what they believe research has to say. But why should policymakers obey “science” when it is being produced by an increasingly insular group of researchers who may confuse their political agenda for science? Third, frustrated conservatives are likely to give up trying to be accepted by the dominant professional associations and journals and instead build their own parallel institutions.  The Bar Association drove out conservatives who built the Federalist Society, which now seems to be thriving more than the “mainstream” organization at exercising policy influence.

I don’t expect this piece to alter this state of affairs.  Leading scholars in our field seem quite adept at defending their prior convictions, sometimes in remarkably unscholarly ways on social media, rather than critically examining their own beliefs and behaviors.  As far as I’m concerned they can rail away, but they will be left with the kind of nasty, unscientific, and irrelevant field they seem determined to build.


Pre-K Helps Test Scores in Short Run But Hurts Them Later

July 16, 2018

Image result for jerry lewis professor

The Arnold Foundation’s Straight Talk On Evidence web site provides a very useful summary of a recently published large RCT on a state-funded pre-K program in Tennessee.  Consistent with a previous, nationally representative RCT of Head Start, this study found that students given access to government-funded pre-school by lottery initially score higher than those who lose the lottery on standardized test scores but then fare worse later.

In the TN study, treatment students score higher at the end of pre-K.  But, as the Arnold summary puts it:

At the end of third grade, the study found statistically-significant adverse effects on student math and science achievement. In math, the VPK group scored 0.12 standard deviations lower than the control group, which equates to roughly 13 percent less growth in math achievement than would be expected in the third grade year.[ii] In science, the VPK group scored 0.09 standard deviations lower than the control group, which equates to roughly 23 percent less growth in science achievement than would be expected in the third grade year.[iii]

In an effort to explain the negative longer-term result, the authors suggest that special education may be to blame.  Students admitted to the government-funded pre-K program were more likely to be labeled as needing special education services and that designation may have lowered academic expectations.  But this explanation is inconsistent with Hanushek, Kain, and Rivkin’s finding that special education tends to improve test score results.  Straight Talk at least considers the possibility that children being with family or in a non-government-funded pre-school may just be academically superior.

The hard reality is that the process of human development is complex and highly varied, so we just don’t know the optimal arrangements for all children.  Andy Smarick has an excellent piece along these lines in the Weekly Standard, suggesting that education policy experts suffer from a Hayekian information problem.  And this was also the subtext of my post last week on how parents are smarter than Technocrats.  Even when Technocrats are armed with the best science, they generally do not have enough information to centrally plan the lives of others.  This doesn’t mean that we never regulate anything.  It just means that if we do regulate we should do so with great caution and large dollops of humility because the experts are typically missing a lot of important information that the individuals they are regulating are more likely to posses.

But caution and humility are no fun, so the Arnold Foundation’s Straight Talk chooses instead to double-down on Technocracy by suggesting that the disappointing results of pre-school as shown in RCTs of both Head Start and the TN program be remedied by identifying which subset of pre-schools seem to be more effective and regulating programs toward imitating those schools:

The above findings and observations, we believe, underscore the need to reform programs such as VPK and Head Start by incorporating (i) rigorous evaluations aimed at identifying the subset of local approaches that are effective, and (ii) once such approaches are identified, strong incentives or requirements for other local program sites to adopt and faithfully implement them on a larger scale.

Keep in mind that the TN program already has regulations in place meant to ensure quality, including requiring at least 5.5 hours of instructional time per day, a cap of 20 students per classroom, a licensed teacher in each classroom, and the requirement that schools choose among a state approved set of curricula.  Also keep in mind that short-term test scores, which are the most common tool by which regulators monitor quality, showed positive results.

If these regulatory practices are insufficient to avoid harming students over the medium term, why would Straight Talk believe that doubling down on the Technocratic approach would make things better?  It would be nice if they at least considered the possibility that we are suffering from a Hayekian information problem and may be unable to devise optimal arrangements for education.


Parents are Smart. Technocrats are Dumb

July 12, 2018

Image result for jerry lewis professor

The technocratic brand of ed reform that is currently dominant is based on the premise that policy elites, guided by science, need to ensure school quality.  Parents should have choices, but they should only choose among quality options.  Mostly using test scores, technocrats believe they can identify quality schools and quality-promoting educational practices, which should over-ride parental preferences about which schools and practices offer a quality education.

A new study by Diether W. Beuermann and C. Kirabo Jackson suggests that parents may be better at detecting which schools promote long-term positive outcomes for their children than technocrats guided by short-term test scores.  They examine the school system in the Barbados in which parents seek admission for their children into schools they prefer, but those schools use test-score cut-offs to determine which students gain admissions.  The cut-offs create a discontinuity that allows for a rigorous causal identification of whether students who barely gain admission to a desired school have different outcomes than those with barely lower lower test scores who are denied admission.

They find that test score gains are no greater for students who were admitted to the schools their parents preferred than those not admitted.  For boys there are some signs that the effect on test score gains may actually be negative.  But when they look at longer-term outcomes, including educational attainment, employment, and earnings, they find significant benefits for students who were admitted to the schools the parents preferred.  These positive effects were driven mostly by gains for girls.  When they explore mechanisms for why these gains occurred, they find a significant reduction in teen motherhood for girls admitted to preferred schools, which contributed to their educational attainment and later employment and earnings.  They also found that both boys and girls experienced significant long-term health benefits as measured by a healthy BMI, regular exercise, and dental check-ups if they gained admission to the schools their parents preferred.  The researchers conclude: “This suggests that preferred schools may promote productive habits and attitudes that are not measured by test scores but contribute to overall well-being. This may represent a significant, previously undocumented, return to school quality.”

So, parents, on average, could detect important aspects of school quality that technocrats guided by test scores would get wrong.  Technocrats would conclude that the schools that parents prefer do nothing to improve student outcomes because test scores don’t rise or even go down when students get into the school their parents want.  But parents are smarter than the technocrats.  They prefer schools that improve long-term outcomes for their children.  Specifically parents seem to be able to choose schools that are more effective in developing the “character” of their children, making the students less likely to get pregnant as teens and more likely to be engaged in positive health behaviors later.  For boys this may not make a big difference in the labor market (although it does not harm those outcomes), but for girls these health improvements seem to drive higher educational attainment, employment, and earnings.

This study is consistent with a long line of research that finds a disconnect between short-term test score outcomes and long-term life outcomes, as described in a recent meta-analysis by my colleagues, Mike McShane, Pat Wolf, and Collin Hitt.  It’s amazing to me how champions of the technocratic approach continue to have faith that they have access to scientific tools to identify school quality that less well-informed parents lack despite the growing body of scientific evidence that demonstrates the very real defects of the technocratic approach.  Despite their daily hymns of praise to science, the technocrats don’t seem very scientific at all.

 


The Gates Effective Teaching Initiative Fails to Improve Student Outcomes

June 21, 2018

Rand has released its evaluation of the Gates Foundation’s Intensive Partnerships for Effective Teaching initiative and the results are disappointing.  As the report summary describes it, “Overall, however, the initiative did not achieve its goals for student achievement or graduation, particularly for LIM [low income minority] students.” But in traditional contract-research-speak this summary really under-states what they found.  You have to slog through the 587 pages of the report and 196 pages of the appendices to find that the results didn’t just fail to achieve goals, but generally were null to negative across a variety of outcomes.

Rand examined the Gates effort to develop new measures of teacher effectiveness and align teacher employment, compensation, and training practices to those measures of effectiveness in three school districts and a handful of charter management organizations.  According to the report, “From 2009 through 2016, total IP [Intensive Partnership] spending (i.e., expenditures that could be directly associated with the components of the IP initiative) across the seven sites was $575 million.”  In addition, Rand estimates that the cost of staff time to conduct the evaluations to measure effectiveness totaled about $73 million in 2014-15, a single year of the program.  Assuming that this staff time cost was the same across the 7 years of the program they examined, the total cost of this initiative exceeded $1 billion.  The Gates Foundation paid $212 million of this cost, with the rest being covered primarily by “site funds,” which I believe means local tax dollars.  The federal government also contributed a significant portion of the funding.

So what did we get for $1 billion?  Not much.  One outcome Rand examined was whether the initiative made schools more likely to hire effective teachers.  The study concluded:

Our analysis found little evidence that new policies related to recruitment, hiring, and new-teacher support led to sites hiring more-effective teachers. Although the site TE [teacher effectiveness] scores of newly hired teachers increased over time in some sites, these changes appear to be a result of inflation in the TE measure rather than improvements in the selection of candidates. We drew this conclusion because we did not observe changes in effectiveness as measured by study-calculated VAM scores, and we observed similar improvements in the site TE scores of more-experienced teachers.

Another outcome was the increased retention of effective teachers:

However, we found little evidence that the policies designed, in whole or in part, to improve the level of retention of effective teachers had the intended effect. The rate of retention of effective teachers did not increase over time as relevant policies were implemented (see the leftmost TE column of Table S.1). A similar analysis based only on measures of value added rather than on the site-calculated effectiveness composite reached the same conclusion (see the leftmost VAM column of Table S.1).

Did the program improve teacher effectiveness overall and specifically access by low income minority students to effective teachers?

…An analysis of the distribution of TE based on our measures of value added found that TE did not consistently improve in mathematics or reading in the three IP districts. There was very small improvement in effectiveness among mathematics teachers in HCPS [Hillsborough County] and SCS [Shelby County] and larger improvement among reading teachers in SCS, but there were also significant declines in  effectiveness among reading teachers in HCPS and PPS [Pittsburgh]. In addition, in HCPS, LIM students’ overall access to effective teaching and LIM students’ school-level access to effective teaching declined in reading and mathematics during the period of the initiative (see Table S.2). In the other districts, LIM students did not have consistently greater access to effective teaching before, during, or after the IP initiative.

And was there an overall change as a result of the program in student achievement and graduation rates?

Our analyses of student test results and graduation rates showed no evidence of widespread positive impact on student outcomes six years after the IP initiative was first funded in 2009–2010. As in previous years, there were few significant impacts across grades and subjects in the IP sites.

Here I think the report is casting a more positive spin on the results than their findings show.  Check out this summary of results from each of the sites:

I see a lot more red (significant and negative effects) than green (significant and positive). The report’s overall conclusion is technically true only because it focuses just on the last year (2014-15) and because it examines each of these 4 sites separately.  A combined analysis across sites and across time, which they don’t provide, would likely show a significant and negative overall effect on test scores.

The attainment effects are also mostly negative.  To find the attainment results at all, you have to dive into a separate appendix file.  There you will see that Pittsburgh experienced a decrease in dropout rates of between 1.3 and 3.5%, depending on the year, which is a positive result.  But Shelby County showed a significant decrease in graduation rates in every year but one.  While dropout, unlike grad rate,  is an annualized measure, the decrease in Shelby County’s graduation rate was as large as 15.7%.  The charter schools also showed a significant decrease in graduation rates as a result of the program in every year but one, with the decline as large as 6.6%.  And Hillsborough experienced a significant increase in dropout rate in one year of about 1.5%.  In three of the four sites examined there were significant, negative effects on attainment. In one site there were positive effects on attainment.

The difference in difference analysis that Rand is using is not perfect at isolating causal effects.  And as the report notes, comparison districts were also sometimes implementing similar reform strategies as the Partnership sites.  But you would expect that the injection of several hundred million dollars and considerable expert attention would improve implementation in the Partnership districts, so the comparison is still informative.  Besides, the fact that some comparison districts were pursuing some of the same reforms does not explain the splattering of red (negative and significant effects) we see.

As Mike McShane and I note in the book we recently edited on failure in education reform, there is nothing inherently wrong with trying a reform and having it fail.  The key is learning from failure so that we avoid repeating the same mistakes.  It is pretty clear that the Gates effective teaching reform effort failed pretty badly.  It cost a fortune.  It produced significant political turmoil and distracted from other, more promising efforts.  And it appears to have generally done more harm than good with respect to student achievement and attainment outcomes.

The Rand report draws at least one appropriate lesson from this experience:

A favorite saying in the educational measurement community is that one does not fatten a hog by weighing it. The IP initiative might have failed to achieve its goals because the sites were better at implementing measures of effectiveness than at using them to improve student outcomes. Contrary to the developers’ expectations, and for a variety of reasons described in the report, the sites were not able to use the information to improve the effectiveness of their existing teachers through individualized PD, CLs, or coaching and mentoring.

 


Max Eden May Be One of the Only Education Reporters Left, and He Isn’t Even a Reporter

June 12, 2018

 

Max Eden may be one of the only education reporters left, and he isn’t even a reporter. His article in The 74 today describing how changes in discipline policy led to the severe deterioration of behavior in a New York City school may be one of the best pieces of education journalism I have read in many years.  It is thoroughly documented, clearly described, and conveys a compelling and alarming story about how discipline reform may go awry.

To be clear, Max does not prove in this piece that discipline reform necessarily or even typically leads to these problems.  But that is not what journalism does.  Reporting raises issues that social science can then examine using its approach to adjudicate whether these patterns are causal and systematic.  The problem is that too many people seem to confuse journalism and social science and think that only the later should exist.

I came to this realization as I was wondering why I so rarely come across the kind of quality journalism contained in Max’s piece.  What are education reporters doing instead?  First, we unfortunately have far fewer education journalists than we used to.  Education is mostly a local story and local newspapers and their ranks of education reporters have been decimated by the rise of internet news over the last two decades.  Second, the national and often foundation-subsidized outlets we have left are often focused on advancing various agendas, whether reform-oriented or partisan, and seem to have little interest in the type of in-depth reporting contained in Max’s piece.

Third, and perhaps most alarming, is that there is a new type of education journalist who imagines him or herself as a mini-social scientist who adjudicates for us what “the research says.”  Despite having no social science training or experience conducting research, this new breed of education journalist holds forth on what the correct interpretation of the social science evidence is.  Often they do this on Twitter, which has a short format that does not allow for in-depth discussion.  Anyone can sound like an expert in a few hundred characters.

But the truth is that there is usually no simple narrative about what social science has to say and reporters are very poorly positioned to adjudicate the truth about social science.  In the past, reporters understood this and used to leave claims about what the evidence says to researchers.  Reporters who covered research saw their role as quoting competing researchers so audiences could get some understanding of the issues in dispute.

Not any more. Now this new breed of faux social scientist/reporter regularly holds forth on what the evidence tells us.  And not surprisingly, the cool kid club of social scientists whose research is affirmed by this new breed of reporter has plenty of praise to heap upon the reporter for being so smart and wise as to say that the researcher is correct. These reporters and researchers have formed a mutual admiration society.  Any criticism of either reporters or researchers in this tight circle is met with considerable outrage and re-iteration of praise for each other, typically on Twitter.

If reporters are going to start masquerading as social scientists, I suppose it is only right that others should step in and start to play the role of journalists.  The world doesn’t need (and is little influenced) by reporters pretending to be social scientists and adjudicating what the evidence says.  But the world does need and our research agenda will be influenced by the type of in-depth reporting that Max Eden has done in his new article on discipline in a New York City school.


Learning from a Study Abroad Course in Israel

June 1, 2018

Image may contain: 5 people, including Albert Cheng, people smiling, people sitting and indoor

Much of my recent research has focused on what students learn from field trips.  I’m inclined to believe that direct exposure to enriching activities conveys a significant amount of learning that cannot easily be obtained from abstract instruction in classrooms.

I don’t just see this as something that only K-12 schools should consider.  Graduate training in education policy may also significantly benefit from exposing doctoral students to more and varied school experiences.  It’s fine to learn how to manage large data bases or how to do the latest adjustment to standard errors, but too few education policy programs are teaching their students to think more deeply about our field, including asking bigger and more interesting questions or considering how policy may need to vary across context.

The doctoral program in education policy at the University of Arkansas, however, is making a concerted effort to give our Ph.D. students more and varied direct experiences in schools.  In particular, we prioritize having students work on randomized field experiments in which they collect their own data in a variety of schools.  Seeing first-hand how programs are operating and understanding the messiness involved in data collection teaches our students things about education policy that they could never learn by staring at numbers in a spreadsheet all day long.  Field trips may be just as important for doctoral training as K-12 education.

To further our commitment to this graduate level version of field trips, Bob Costrell and I developed and just completed leading a study abroad course for our doctoral students in Israel.  Two cohorts of our Ph.D. students were offered the opportunity to tour Israel for 10 days, following the completion of assigned readings and a few days of preparation.  Our tour included discussions with experts at Hebrew University, Shalem College, and the Shoresh Institution, as well as visits to school programs in Jerusalem, the Galil, and Tel Aviv.  Because our Department is in a very strong position financially, we were able to offer this course at basically no cost and all of the eligible students chose to participate.

Why did we take our graduate students to Israel and what did they learn from going there?  We went to Israel because it has many of the same educational challenges we face in the U.S.  Their test scores are lagging in international comparisons.  They have stubborn education gaps that have resisted efforts to close them.  They have centralized standards, curriculum, and test-based accountability along with decentralized school choice that struggle to balance individual freedom and national unity.

But if we just wanted to see educational challenges like our own, we could have visited schools in the U.S.  The real benefit of going to Israel was seeing similar challenges being addressed in very different contexts.  It became abundantly clear that many of the reforms we pursue in the U.S. do not work the same way in Israel and vice versa.

For example, the “tight-loose” approach favored by supporters of Common Core combines centralized standards with school autonomy over how best to teach those standards.  In the U.S. tight-loose tends to devolve quickly into tight-tight as schools are so eager to comply with central mandates that they focus narrowly on centrally determined objectives and exercise relatively little autonomy in selecting different paths for achieving those objectives.  In Israel, their “start-up” culture facilitates less slavish obedience to central-mandates and more school and teacher autonomy in how they achieve centrally determined objectives.  Of course, these are broad observations and there is considerable variability within both the U.S. and Israel.

But the point is that context matters.  The same policy with the same incentive system will work very differently in different places, across as well as within countries.  The dominant economic approach to education policy tends to think of schools and educators as inter-changeable widgets.  The same policy with the same incentives should produce the same results.  Then we are shocked each time we try at scale something that worked as a pilot program only to discover that we get very different results.  Rick Hess has been warning us about how much the context of policy implementation matters, but perhaps we can only really learn this lesson when we see those very different contexts for ourselves.

There are also problems associated with educational tourism.  There is a risk that people will select on the dependent variable and draw lessons about what “works” without relying on a reasonably rigorous research design.  While there are limits to what can be learned with confidence from direct experience, there are also limits to what can be learned from large data sets abstracted from context.  Education policy needs to do a better job of balancing rigorous methods with contextual understanding.  In the Department of Education Reform’s doctoral program we are striving to achieve that balance and are committing our resources to provide that balanced training to our students.

Image may contain: 3 people, people smiling, people standing

Image may contain: 5 people, including Jon Mills, people sitting, table and indoor

Image may contain: 3 people, people standing

Image may contain: 11 people, including Ian Kingsbury, Heidi Holmes Erickson, Albert Cheng, Elise Swanson, Angela Robinson Watson, Bob Costrell, Jay P. Greene and Jon Mills, people smiling, people standing, sky and outdoor