The Wagner Epic Continues

February 3, 2009

No, not that Wagner.  There is more on Tony Wagner, the snake-oil salesman educational consultant.  My op-ed on Wagner ran in the Northwest Arkansas Morning News.  I’ve also reprinted the text below, since it is easier to read that than the scanned pdf in the link.

The first community discussion on Wagner’s book, The Global Achievement Gap, was held last nightIt wasn’t too bad.  A number of teachers (at tables other than mine) expressed resentment at the suggestion that they weren’t already aware that critical thinking and creativity were desirable.  But administrators and GT teachers seemed more enamored with the book.  And the reaction from parents and community members included a fair degree of skepticism. 

It’s hard to get people to think critically about people who push a focus on critical thinking.  To be for critical thinking is like being for goodness and light.  The tricky part is in how you get there.  To the extent that Wagner has any concrete suggestions, he seems to be taking folks down the wrong path.  He wants less emphasis on content and less testing.  But he shows no evidence that higher levels of critical thinking can be found in places or at times when there was less content and less testing.  In fact, the little evidence he does provide would suggest the opposite.

Some smart folks are pushing back against these data-free educational consultants.  Sandra Stotsky had an op-ed on Wagner last weekDan Willingham had an excllent blog post on Alfie Kohn as did Stuart Buck.  And Robert Pondiscio at Core Knowledge ,  AndyRotherham at Eduwonk , and Ken De Rosa at D-Ed Reckoning have added their two cents (which, with the new stimulus package, will become 2 trillion cents).

So here is my op-ed pasted below:

Fayetteville Public Schools Need Evidence, Not Snake-Oil (submitted title)
By Jay P. Greene

              The Fayetteville Public Schools purchased 2,000 copies of Tony Wagner’s The Global Achievement Gap and organized a series of public fora to discuss how that book might guide our schools.  The District is to be commended for engaging the community in this process.  But it is unclear why the District selected Wagner’s book as the focus of this discussion.

Wagner’s book makes claims about what skills students really need to learn, what is blocking them from learning those skills, what countries are more successful in teaching these skills, and what some schools are doing to remedy the problem.  But he provides no systematic evidence to substantiate any one of these claims.  In short, the book is a series of anecdotes that more closely resembles what one would find in a self-help manual than in a work of social science.  If we apply our critical thinking skills, which Wagner says are essential, we should reject this book as a sound basis for planning the future of Fayetteville schools.

First, Wagner says there are seven essential survival skills that our children need to learn.  How does he know that these are the essential skills?  He chatted with a CEO on an airplane and selected a few more to interview.  Does he review any research on the types of skills that predict who will become successful adults?  No. Wagner relies upon the authority of his experience and the experiences of a handful of corporate executives to identify the essential skills.  Accepting claims on this basis would be the sort of thing we would hope people with critical thinking skills might reject.

Frankly, the seven skills he lists — critical thinking, collaboration, adaptability, initiative, communication, analysis, and imagination – seem reasonable enough, but they are also so vague as to be unhelpful in informing schools about what to do.  How exactly do we produce critical thinking or adaptability or creativity?  It’s not as if educators have been unaware of these goals, but they haven’t generally been effective at developing strategies to achieve them.

Then Wagner identifies what he believes is blocking the acquisition of these seven essential skills – high stakes testing.  What evidence does he present to support this claim?  Again, he presents no systematic evidence to demonstrate that there is a tradeoff between the content knowledge required in accountability testing and the essential skills he wants.  Couldn’t it be the case that improving mastery of basic skills and content knowledge provides the foundation for these seven skills?  It’s hard to be imaginative, analytical, etc… without knowing subject matter and basic skills of literacy and numeracy.  Einstein may have said “imagination is more important than knowledge,” as the book’s dedication indicates, but Einstein couldn’t have succeeded without a firm grasp of advanced mathematics.

If Wagner were right that accountability testing undermines essential skills, then surely these skills must have been more plentiful before testing became as salient as it is today.  But Wagner does not (and cannot) provide any evidence to show that.  Instead, he shows (on p. 74) that students in the United States significantly lag students in Finland, Hong Kong-China, Japan, and Korea in certain problem solving skills on an international test called PISA.  As it turns out, high stakes testing is extremely prominent in most of these countries with strong problem-solving results – a fact curiously at odds with Wagner’s claims.  If accountability testing undermines essential skills, why do countries with such strong accountability systems manage to succeed so well in teaching the essential skills Wagner wants?

Wagner describes three model schools that he says have been effective at teaching essential skills (although he again fails to provide any evidence that they are as successful as he claims).  But it is by no means clear that the approaches adopted by these three schools are the only valid approaches or that they could be replicated easily by others.  Replication is especially problematic because the three models he provides are all charter schools or alternative schools of choice.  Perhaps the secret of these schools’ success has something to do with school choice and not the features he describes.  If true, it’s not clear how Fayetteville could imitate the success of these schools.

To achieve our goals in education we have to adopt approaches backed by systematic evidence.  If we believe critical thinking, collaboration, and creativity are the most important goals for schools, then we need systematic evidence on systems of teacher preparation, curriculum, and pedagogy that effectively produce those goals.  There is a growing body of scientific research on these issues, including a number of studies sponsored by the U.S. Department of Education’s Institute for Education Sciences, that the Fayetteville Public Schools might wish to consider rather than consulting with the latest peddler of educational snake-oil.


Jeb for National School Grades

January 23, 2009

BUSH EDUCATION SUMMIT

“Everybody do the FCAT! Yeah!”

HT Orlando Sentinel

(Guest post by Greg Forster)

This morning, Jeb Bush comes out for a national school grading system on NRO.

What he’s proposing is a federal grade A to F for each school, based on both performance level and improvement – kind of the way Florida schools are graded under the A+ system (though Jeb doesn’t propose federal sanctions for poorly performing schools, just a grading system). He justifies the move on grounds that the NCLB system encourages states to lower standards.

Jeb doesn’t discuss this in the article, but readers of JBGB know that a clash has been brewing between Florida’s A+ program and NCLB. Florida, which has had success with the A+ program (where improvement in performance is a factor alongside performance level), is going to run into the 2014 “everybody must be proficient” wall along with everyone else.

No doubt our own Matt Ladner, chronicler of the looming conflict in the posts linked above, will have more to say about this (hopefully including some more classy artistic illustrations), but just to put my own two cents in, I’m not clear on why there needs to be a national grade.

For that matter, I’m not even convinced we need a national test, since that sacrifices the merits of interstate competition. At both the state and federal levels, the test is being developed and implemented by a bureaucracy that is heavily colonized by the defenders of the status quo and thus will be looking for opportunities to dumb down the test or manipulate the scoring to make schools look better. But if one state dumbs down while another (under political pressure from reformers) stays the course and makes real improvement, that creates pressure on the dumb state to get with it.

The impetus for a single national test, it seems to me, is because federal rewards and punishments create an incentive to dumb down. If we’re not going to have rewards and punishments based on the scores, what’s the need for a single national test? Why not just require each state to maintain a transparent testing system of its own devising – or, if that’s not good enough, require each state to purchase and use one of the major privately developed national tests?

But we can leave that aside. Let’s stipulate the case for a national test. Still, if you’re not going to hold schools accountable with rewards and penalties, then why issue grades along with the test scores? Why not just give a test and report the results numerically, and let private organizations put together their own grading systems? That way people can decide for themselves what aspects of performance measurement matter most, rather than turning the job over to a federal bureaucracy that has an incentive to make schools look better.


Quality Counts Lacks Quality

January 9, 2009

(Guest post by Stuart Buck)

Education Week has released its annual report “Quality Counts,” which ranks all fifty states’ education systems along several dimensions, such as school finance, achievement, accountability, and the like.  You can find detailed statistics for any given state on an interactive map, and you can generate a table comparing the states of your choosing.

This Quality Counts report gets a huge amount of attention, as can be seen from the hundreds of results in a search of Google News.

  But the Quality Counts report suffers from two glaring flaws.  In fact, the report reminds me of the old joke (I can’t remember who to credit for this) of a beggar sitting on the streets of New York, with a sign reading, “Wars, 2; Legs Lost, 1; Wives Who Left Me, 2; Children, 3; Lost Jobs, 2.  TOTAL: 10.”  Well, obviously, the number “10” doesn’t represent ten of anything

 So what’s wrong with the Quality Counts report?

First, the “School Finance” measure has two basic components: equity and spending.  Equity refers to several measures that look at whether a state’s districts get relatively equal funding.  Fair enough, although there’s a decent argument that impoverished districts might need higher spending to attract better personnel.  But then part of the “School Finance” measure is based on per-pupil spending, as well as the percentage of a state’s taxable resources dedicated to education. 

The problem here is that it doesn’t make sense to reward a state with a higher grade just for spending more, in and of itself.  Indeed, the “spending” measure ends up getting averaged with the measure for “K-12 Achievement.”  This means that, in theory, a state with high spending and low achievement — thus combining incompetence and extravagance — could get an overall score equal to a state with low spending and high achievement.  But if a school manages to get high achievement with low spending, this means that, all else equal, that state has a more efficient and productive education system. 

Second, an even worse problem lies in the “Chance for Success” measure.  This ranking is supposed to tell us about the chances that people in a given state have of succeeding.  There are numerous components to the “Chances for Success” measure, including percent of students above 200% of the poverty line, percent of students with college-educated parents, percent of children whose parents speak English, and more.  Not surprisingly, the richer and more privileged states like Massachusetts, New Jersey, and Connecticut do quite well on this measure, while states like Arkansas, Mississippi, and New Mexico are near the bottom. 

What makes no sense whatsoever is that a high score on the “Chance for Success” measure is averaged together with all the other items — including K-12 Achievement — to produce each state’s final score.  You can see this for yourself: Pick your home state here, and then take the simple average of all six measures (Chances for Success; Standards, Assessment & Accountability; K-12 Achievement; Transitions & Alignment; School Finance; and Teaching Profession), and that average will be the state’s overall final score. 

In other words, imagine a state that managed to produce A-level achievement even though its population was poor and disadvantaged (and thus got a lower grade on the “Chances for Success” measure).  Under any rational grading system, we should give that state the highest possible rating.  But the Quality Counts method would actually downgrade the state for having too many poor children.  By the same token, Quality Counts would upgrade a poor-achieving state that happened to have a privileged and rich student population, even though that state’s education system would obviously be far more incompetent and inefficient.  If anything, the “Chances for Success” ranking should be counted inversely as compared to all the other measures of a state’s education system. 


Great Minds Think Alike

December 10, 2008

I’m not the only one to compare the auto bailout to K-12 educationAndrew Coulson has a great piece with that theme over at Cato.

Also, I’m not the only one to see Rorschach (or is it Horshack) inkblots in the TIMSS results spin-festRob Pondiscio over at Core Knowledge also references Horshack — er, I mean, Rorschach.  And in the forthcoming Gadfly a little birdie named Mike Petrilli told me that he also has a Rorschack/Horshack piece forthcoming.

If this continues I’m going to have to re-position my lead helmet that keeps others from reading my thoughts with their Alpha rays.


The TIMSS Rorschach Test

December 9, 2008

The Rorschach inkblot test is a psychology test that was used to assess personality and emotions.  The way in which people saw ambiguous images, like the one above, was supposed to say something about who they really were.

The same is true for the interpretations being applied to the results of the 2007 TIMSS (Trends in International Mathematics and Science Study) released today.

Over at Flypaper, Mike Petrilli interprets the gains the US has made in math but not science as suggesting that accountability testing is shifting resources toward math and away from science: “The lesson is that what gets tested gets taught. Under the No Child Left Behind act, and state accountability systems before that, elementary schools have been held accountable for boosting performance in math and reading. There is evidence that American elementary schools are spending less time teaching science, and this is showing up in the international testing data.”

And Mike interprets the relatively good results that Minnesota had (yes, MN took the test as if it were a country) as supporting rigorous standards: “There’s also good news out of Minnesota today, which has made dramatic gains since adopting new, more rigorous math standards.”

But also at Flypaper, Diane Ravtich offers different interpretations.  She sees the gains even in math results as “actually small, only four points.”  She also declines to credit NCLB for any of those gains, even as a perverse result of resource shifting away from science.  She notes that gains were at least as large in the US during the period prior to implementation of NCLB.  And on the topic of Minnesota she takes issue with Mikes explanation for success: “Minnesota showed dramatic gains on TIMSS not because of ‘new, more rigorous standards,’ but because of that state’s decision to implement a coherent grade-by-grade curriculum in mathematics.”  Umm, I would explain the difference but I got so bored trying to distinguish standards from curriculum that I dozed off for a bit.

Rather than focusing on the gains (or lack of gains) made by the US relative to itself in the past, Mark Schneider at Education Week focuses on the comparison between the US and other countries.  He notes that while the US looks relatively strong on the TIMSS, that is distorted by the large number of  “low-performing countries in the calculation of the international average [including Jordan, Romania, Morocco, and South Africa that] drives down that average, improving the relative performance of our students.”

He further notes that we fare worse on the PISA, which reports results from the 30 OECD countries who are our major trading partners and economic competitors: “We do better in TIMSS than we do on PISA, but this is a function of the countries that participate in each, and we should not let the relatively good TIMSS results lull us into a false sense of complacency. Even in the relatively easier playing field of TIMSS, we are lagging far too many countries in overall math performance and in the performance of our best students.”

And at Huffington Post Gerald Bracey was able to offer his reaction to the results last week, before they were released.  He wrote: “It might be good to keep a few things in mind when considering the data:

1. The Institute for Management Development rates the U. S. #1 in global competitiveness.

2. The World Economic Forum ranks the U. S. #1 in global competitiveness.

3. The U. S. has the most productive workforce in the world.

4. “The fact is that test-score comparisons tell us little about the quality of education in any country.” (Iris Rotberg, Education Week June 11, 2008).

5. ‘That the U. S., the world’s top economic performing country, was found to have schooling attainments that are only middling casts fundamental doubts on the value, and approach, of these surveys…'”

Bracey also said that our students could beat up the students in other countries with higher TIMSS scores.  (Actually, I made that last bit up.)

To summarize, Mike Petrilli sees evidence supporting his past concerns about the narrowing of the curriculum and the need for rigorous standards.  Diane Ravitch sees no evidence to alter her negative view of NCLB.  Mark Schneider, the former head of the National Center for Education Statistics, sees the need to review more testing.  And Gerald Bracey doesn’t even have to see the results to know that our education system is doing a great job.  And when I look at the inkblot I see a pudgy guy with a beard and male-patterned baldness laughing.

(edited for clarity)


Replication, The True Test of Research Quality

December 2, 2008

When people can’t argue the facts, they argue peer review.  That’s been my experience when I’ve released non-peer reviewed reports.  Without peer review, folks wonder, how can we know whether to trust these results?

The reality is that even with peer review people still need to wonder whether to trust results.  Peer-review is by definition irresponsible — by which I mean that the reviewers have no responsibility.  By being anonymous, reviewers offer their opinions on the merit of research without any meaningful consequence to themselves.  Many reviewers do a laudable job, but there is nothing to stop them from using their reviews to advance findings they prefer and block findings they dislike regardless of the true merit of the work.  Peer-review is often little more than the anonymous committee vote of a panel composed of some mix of competitors and allies.  It is about as reliable as the Miss Congeniality vote at a beauty contest.  Do we really think she’s the nicest contestant or did the other contestants voting anonymously have ulterior motives for burying her with faint praise?

The true test of research quality is replication.  Science doesn’t determine the truth by having an anonymous committee vote on what is true.  Science identifies the truth by replicating past experiments, applying them to new situations, to see if the results continue to hold up. 

I’m pleased to say that several pieces of my work have been successfully replicated.  By successful replication I mean that the basic findings are upheld.  Replicators almost always make new and different choices about how to handle data or run an analysis.  The question is whether the same basic conclusion is found even when those different choices are made.

The evaluation I did with Paul Peterson and Jiangtao Du of the Milwaukee voucher experiment was successfully replicated by Cecilia Rouse.  The evaluation I did of the Charlotte voucher program was successfully replciated by Josh CowenMy study of of Florida’s A+ voucher and accountability program was successfully replicated three times — by Raj Chakrabarti; Rouse, et al; and West and Peterson.  And my graduation rate work has been successfully replicated by Rob Warren and Chris Swanson.

The interesting thing is that every one of my studies above was initially released without peer review.  And every one of them was attacked for being unreliable because they were not peer reviewed.  When they were all later published in peer reviewed journals (except the grad rate work) and successfully replicated I don’t remember ever hearing anyone retract their accusations of unreliability. 

(edited for typos)


The Stupidity of “Smart”

November 17, 2008

The next time I hear someone call for “smart” regulation, “smart” growth, “smart” boards, or “smart” anything I’m going to have to pull a Dr. Evil and get them to zip it — zip it good. 

Appending “smart” before something for which you are advocating is not only a very worn and tired tactic, it is also — for lack of a better word — stupid.  It’s stupid because simply labeling something as smart does not make it so.  Even worse, adding the label “smart” is intentionally ambiguous, allowing the audience to imagine that the “smart” adjective includes whatever people prefer and excludes whatever they oppose, even though everyone is imagining a different set of what is included or excluded by “smart.” 

A lot of normally smart and good people have fallen into the “smart” rhetorical ditch.  Mike Petrilli over at Flypaper was rightly opposing efforts to re-regulate education when he urged: “But the answer is not a return to old-fashioned regulation, but a move to smart regulation.”  That’s like fingernails scratching a smart board.

And sometimes the addition of “smart” negates the  noun it is modifying in an Orwellian fashion.  So, “smart” growth really seems to mean no growth or at least highly restricted growth.  That’s a fine position to take, but it is just bullying to imply that all other positions are not “smart.”  Rather than bullying others and disguising what one is really advocating with the “smart” trick, people should just come out and say what they prefer. 

Mike Petrilli prefers less regulation in education.  The proper term for that view is de-regulation, not smart regulation.  Saying de-regulation at least specifies the direction in which he thinks policy should go, while advocating for “smart” regulation reveals nothing about the preferred direction.  That doesn’t mean he favors the elimination of all regulation.  It’s just that in general he prefers less.  And he makes some effort to tell us what kinds of regulations he would like to eliminate and which should remain.

I agree with him.  But I have one regulation to propose.  Let’s stop talking about “smart” regulation. Or, if we have to develop vapid and deceptive marketing slogans for our proposals, I suggested that we follow the spirit of DJ Super-Awesome and let’s replace “smart” with “super-awesome.”  If we start talking about “super-awesome regulation” the stupidity of “smart” will be more obvious.


Grading New York

November 13, 2008

(Guest post by Greg Forster)

Our old friend and colleague Marcus Winters has just released a study on New York City’s school grading program:

In 2006-07, New York City, the largest school district in the United States, decided it would follow several other school systems in adopting a progress report program. Under its program, the city grades schools from A to F according to an accumulating point system based on the weighted average of measurements of school environment, students’ performance, and students’ academic progress.

The implementation of these progress reports has not been without controversy. While many argue that they inform parents about public school quality and encourage schools to improve, others contend that grades lower morale at low-performing schools. To date there has been too little empirical information about the program’s effectiveness to settle these questions.

Schools that recieve D and F grades repeatedly are subject to takeover by the city. A previous study (Rockoff and Turner 2008) found positive results from the program but lacked student-level data. Marcus’s study has got student-level data, regression discontinuity – the whole smash. Tale of the tape:

Students in schools earning an F grade made overall improvements in math the following year, though these improvements occurred primarily among fifth-graders.

Students in F-graded schools did no better or worse in English than students in schools that were not graded F.

Whatever problems NCLB may have, school accountability does work in places where state and local government have the political will to do it seriously. Even in places where the problems seem intractible, like New York City.

EMTs are standing by in case certain people’s heads explode.


PJM on Candidates’ Education Flip-Flops

November 3, 2008

(Guest post by Greg Forster)

Over the weekend Pajamas Media carried my column on how Obama and Palin have flip-flopped on education:

Suppose I told you Candidate A has supported rigorous academic standards, has stood up to the teachers’ unions — even been booed by them at their convention — and proclaimed the free-market principles that schools should compete for students and better teachers should get higher salaries. On the other hand, Candidate B says that competition hurts schools, that kids should be taught a radical left-wing civics curriculum, that we should throw more money at teachers’ unions — excuse me, at schools — and that rigorous academic standards should be replaced with the unions’ old lower-the-bar favorite, “portfolio assessment.”

Candidate A is Barack Obama. So is Candidate B.

Meanwhile, Candidate C has made an alliance with the teachers’ unions, opposed school choice, thrown money at the unions — excuse me, at schools — and even helped undermine a badly needed reform of bloated union pensions. On the other hand, Candidate D has broken with the teachers’ unions, demanded that schools should have to compete for students, and endorsed the most radical federal education reform agenda ever proposed by a national candidate, including a national school choice program for all disabled students.

Candidate C is Sarah Palin. So is Candidate D.

Important disclaimer:

None of this implies anything about the overall merits of any of these candidates. One can love a candidate overall while hating his or her stand on education, and vice versa.


Memo to Gadfly: History Failure Is a Historical Problem

October 21, 2008

“Okay, Mr. Hancock, if you’re so smart, how many of the freedoms protected in the Magna Carta can YOU name?”

(Guest post by Greg Forster)

The new Gadfly includes a guest editorial lamenting that our students don’t know civics, history, geography, etc. The editorial claims social studies is being “squeezed out” by accountability programs and that we should be “reinserting history and related subjects back into the curriculum.”

All this assumes that the failure of public schools to teach social studies effectively and the resulting colossal student ignorance of civics is a new phenomenon. Otherwise, the claim that social studies is being “squeezed out” and the call to “reinsert” it would make no sense.

But in fact this is not a new phenomenon. The catastrophic failure of social studies education in public schools is a subject with a long history. So what does that do to the story that social studies is being “squeezed out”?

Let me be clear: if I thought it were true that social studies was being squeezed out, but I also thought this would result in a change to our 70% national graduation rate (50% urban) and rampant illiteracy and innumeracy even among those who “graudate,” I would consider that a price well worth paying – and I say that as a social scientist. But the evidence that social studies is being squeezed out is not in fact convincing.