Eugenics: A Case Study of the Dangers of Technocracy

May 7, 2018

IMG_1529

Technocracy is the belief that government should be run by experts, with policies shaped by scientific evidence.  Advocates of technocracy have little enthusiasm for people making decisions about their own lives or those of their children because people too often choose the wrong thing.  Experts, guided by evidence, are much better situated to shape people’s decisions so that they work best for themselves and others.

Technocracy rose to prominence during the Progressive Era, but it has hardly lost its appeal to elites since then.  It is clearly the dominant mode of thought among education policy experts.  In fact, at the most recent annual conference of the Association for Education Finance and Policy, attendees wore buttons declaring their creed, “Evidence-Based.”  Let’s leave aside that appending “Based” to “Evidence” seems to negate what it is modifying, like “natural flavoring” or “based on a true story.” And let’s acknowledge that evidence is, of course, extremely useful for making good decisions.  But the motivation behind this button and the thinking that pervades education experts is that policy should be “based” on evidence, not merely informed by it.  Evidence is the foundation.  Technocracy should rule.

To repeat, evidence is a good thing.  But claims about what the evidence really says are often in dispute and science is a very limited and imperfect enterprise.  So to be ruled by evidence rather than informed by it is extremely dangerous.  Consider the example of eugenics, which is “the science of improving a human population by controlled breeding to increase the occurrence of desirable heritable characteristics.”  Eugenics is now considered thoroughly disreputable, but for several decades it was the consensus approach of our scientific elite.  Its science was widely respected and its practices and policy recommendations were “evidence-based.”

It’s a little too easy to dismiss eugenics as a horrible error of our pre-scientific past.  For several decades, it was the scientific present of the most respected elites.  As Sol Gittleman put it: “The presidents of MIT, Stanford, Cornell, and Harvard all supported eugenics research, and as early as 1914, academic courses on the subject were taught at Harvard, Columbia, Cornell, Brown, Wisconsin, Northwestern, Clark, and MIT. Presidents Theodore Roosevelt and Woodrow Wilson, meanwhile, spoke openly and wrote freely about ‘racial suicide’—their term for what would happen if the nation permitted the mixing of races.”

While laws against the “mixing of races” had been introduced during slavery, a flurry of new laws were adopted as a result of this scientific inquiry into eugenics such that 41 of the then 48 states eventually had such laws in place.  You could say that these laws were “evidence-based.”  In addition, laws calling for the forced sterilization of people deemed to be “feeble-minded” were adopted and ultimately upheld by the U.S. Supreme Court.  Chief Justice Oliver Wendell Holmes famously declared in his decision, “Three generations of imbeciles are enough.”  This ruling by the Supreme Court was also considered “evidence-based.”

During World War II, President Franklin Roosevelt organized a secret committee to consider what to do with the large number of war refugees, especially Jews, who he expected to flee Europe after the war.  Roosevelt asked Aleš Hrdlička, curator of physical anthropology at the Smithsonian Museum of Natural History, to head this secret planning group.  It’s worth quoting Steve Usdin’s account of this episode at length:

The two men had carried on a lively correspondence for over a decade and the President had absorbed the scientist’s theories about racial mixtures and eugenics. Roosevelt, the scion of two families that considered themselves American aristocrats, was especially attracted to Hrdlička’s notions of human racial “stock.”

A prominent public intellectual who had dominated American physical anthropology for decades, Hrdlička was convinced of the superiority of the white race and obsessed with racial identity. Shortly after the Pearl Harbor attack he’d written to Roosevelt expressing the view that the “less developed skulls” of Japanese were proof that they were innately warlike and had a lower level of evolutionary development than other races. The president wrote back asking whether the “Japanese problem” could be solved through mass interbreeding.

Roosevelt had long resisted opening the doors to large numbers of immigrants, not as a result of political expediency, but based on his understanding of what science had to say on the matter.  In 1925 Roosevelt had written a series of columns for the Macon Telegraph in which he praised Canada’s immigration policies, which were designed “to prevent large groups of foreign born from congregating in any one locality…. If, twenty-five years ago, the United States had adopted a policy of this kind we would not have the huge foreign sections which exist in so many of our cities.”

This evidence-based resistance to increasing immigration condemned countless European Jews to their death.  It also informed the findings of the secret committee he organized as to what to do with Jewish refugees following the war: “The solution, which the President endorsed, ‘essentially is to spread the Jews thin all over the world,’ rather than allow them to congregate anywhere in large numbers.”  Apparently, he hoped to improve their stock through inter-breeding, as he speculated might be done to reduce war-like tendencies among the Japanese.

Keep in mind, eugenics was not championed by a fringe group.  It was championed by the presidents of  leading universities, researchers at the Smithsonian, and several presidents of the United States.  I’m proud to note that my alma mater, Tufts University, never offered a course in eugenics, and a Tufts medical professor, Abraham Myerson, was a leading critic of the idea, including in his testimony against forced sterilization of the “feeble-minded.” But Tufts was the exception, while more elite universities like Harvard and MIT actively pursued eugenics.  Only the close association between eugenics and the Nazis eventually brought the idea into disrepute.

Before we turn over policymaking to the current scholars at Harvard and MIT, we might want to reflect on how wrong evidence-based policies can be.  And rather than smugly asserting that past scholars were quacks while current ones are true scientists, we might want to learn the lessons of humility that the eugenics episode teaches.  Let’s be informed by evidence, but not be evidence-based.


Lessons from Failure

April 30, 2018

Image result for ever tried ever failed quote

Mike McShane and I have an article in the Phi Delta Kappan Magazine summarizing the lessons we learned from our edited book on Failure.

We took the contributions by Larry Cuban (from Stanford University), Matthew DiCarlo (the Shanker Institute), Anna Egalite (North Carolina State University), Rick Hess and Paige Wiley (the American Enterprise Institute), Ashley Jochim (the Center for Reinventing Public Education), Matthew Ladner (the Charles Koch Institute), Megan Tompkins-Stange (the University of Michigan), Martin West (Harvard University), and Daniel Willingham (the University of Virginia) and boiled it down to three trade-offs and three lessons.

But if like Hillel I had to state what we learned while standing on one foot, I’d say, “Education is an inherently political enterprise, so if you try too hard to substitute normal political processes with the authority of technical expertise, you will fail.”


Theater Experiment in Educational Researcher

April 24, 2018

You don’t have to wait until tomorrow, tomorrow, or tomorrow.  Our article on the effects of student groups seeing live theater is available on Educational Researcher today!

The article is an updated and peer-reviewed version of the article we posted on SSRN last fall.  In it we discuss the combined results of five experiments we conducted in which students were randomly assigned to go on a field trip to see live theater or be in the control group.  In two of those experiments we added a second treatment condition in which students went on a field trip to see a movie version of the play.  We found that students randomly assigned to see live theater experienced significantly higher tolerance and social perspective taking as well as stronger knowledge of the plot and vocabulary of the plays than the control group.  Being randomly assigned to the movie treatment did not produce these same benefits.

So there seems to be something about experiencing live theater that cannot easily be produced by watching a movie instead.  Given how often schools are inclined to watch movies and how rarely they are now willing to go see live theater, these results are quite relevant.


And the Higgy Goes to… John Wiley Bryant

April 17, 2018

Image result for John Wiley Bryant

Today taxes are due, so it is time to announce the recipient of this year’s William Higinbotham Inhumanitarian Award.  We had many (un)worthy nominees, so it was difficult selecting the winner (loser).  My nominee, Derek Jeter, is certainly annoying in trying to make us eat our baseball vegetables by denying fans the fun distraction of mascot races while the team loses a lot of baseball games goes through its rebuilding phase. But the criteria for awarding The Higgy states that: “‘The Higgy’ will highlight individuals whose arrogant delusions of shaping the world to meet their own will outweigh the positive qualities they possess.”  So, there should be some amount of coercion in whoever receives The Higgy and Jeter is not really forcing anyone to have no fun at baseball games.  If anything, it is my own darn fault for being a Marlin fan.  Jeter is just doing a poor job of running the team, but I am free to become a fan of another team or enjoy something else.

Jason’s nominee, Traci Wilke, was a principal who punished a student for secretly recording a teacher making threats against another student.  There is clearly an element of coercion in the principal’s behavior, but if we started awarding Higgies to every school administrator who suppressed the revelation of unflattering information, we’d run out of space on the internet.  It would be like handing out speeding tickets at the Indy 500.

This year’s Higgy really comes down to Greg’s nominee, Romanus Cessario, or Matt’s nominee, John Wiley Bryant.  Greg’s nominee is certainly vile for defending the forced abduction of a Jewish child because he believes Catholic doctrine requires it.  It almost feels like the sort of argument one might make as a freshman in college to see what ridiculous extremes you might reach if you followed a certain idea to its bitter end.  But this is a serious grown-up writing in First Things, which was once a respectable outlet.  As Greg notes, the really insidious part of the article is that it reveals how much social conservatives seem to be willing to abandon liberalism.  The way I’d put it is that these days you don’t have to scratch much beneath the surface to discover how many Jew-hating authoritarians there really are out there.

But I think Cessario falls short because he has no ability to shape the world to his ends.  Writing this kind of drivel has about as much influence on the world as the guy sitting on the park bench muttering to himself about how things will be different when he is in charge.  Greg is right that abducting children is BSDD, but I think writing in defense of it falls short of being PLDD.  The too-easy embrace of authoritarianism and Jew-hating by social conservatives is alarming, but Cessario is a very mediocre anti-Semite.  He couldn’t even achieve excellence at that.

John Wiley Bryant is the most deserving of this year’s Higgy because he arrogantly and coercively sought to reshape the world in a way he imagined would be better, but ended up making it significantly worse.  Like Matt and many other people of our generation, I gained significant cultural literacy (and had a ton of fun) watching Bugs Bunny cartoons.  For trying to force us to watch “educational” television instead of freely choosing quality programming, John Wiley Bryant is awarded the William Higinbotham Inhumanitarian Award.  He joins last year’s winner, Plato, the 2016 winner, Chris Christie, the 2015 winner, Jonathan Gruber, the 2014 winner, Paul G. Kirk, and the inaugural winner, Pascal Monnet.


Update — Thanks to Greg for being our official Higgy Historian and remembering earlier winners.

 


What’s Wrong With Portfolio Management in Louisiana?

April 16, 2018

Image result for falling off cliff

Education reform seems to be consumed by a string of fads.  When things don’t work out, we tend to move on to the next fad without reflecting very much on what went wrong so that we might avoid that error in the future.  Mike McShane and I recently edited a book on Failure, which explicitly attempted to correct this problem by acknowledging failures and trying to draw lessons from them.

One of the recent fads that enchanted reformers was Portfolio Management, which was supposed to ensure that only high-quality school options were available to families.  It’s beginning to be painfully clear that Portfolio Management is failing.  It appears to be failing politically, as Denver retreated from Portfolio Management before it even really got going and New Orleans shifted control of the portfolio back to the long-reviled traditional school district board.  But now there is some evidence to suggest that Portfolio Management is suffering educationally as well.

To the extent that NAEP results are informative about school quality (and I’ve previously expressed my doubts about this), test scores for Louisiana charter schools have been falling off a cliff. In 8th grade math, for example, scores rose to as high as 280 in 2013, but have dropped to 264 in 2017.  A change of 10 scale points is supposed to correspond roughly to a grade level, so this is a pretty precipitous drop over the last four years.  In 8th grade reading scores rose to as high as 261 in 2013 before falling to 254 in 2017.  4th grade reading and math scores have similarly declined.

I’d like to hear what champions of the Louisiana portfolio model think is going on.  I thought Portfolio Management was supposed to give us only high quality options — and it largely relies on test scores as an indicator of quality — so why are the scores dropping?  Are Portfolio Managers actually not very good at predicting quality?  Have there been other regulatory changes that came along with Portfolio Management that have harmed the educational environment?  For example, the leaders of the Recovery School District were at the forefront of eliminating exclusionary discipline from schools.  Could the change in school discipline have eroded behavioral control and harmed achievement?  Of course, it is always possible that there have been changes in the composition of students in charter schools which have caused these declines, although virtually all schools in New Orleans are charters and the composition of the city has not changed that much in 4 years.

But it is important to remember that just eyeballing NAEP scores is a horrible way to assess causal effects of programs, so we should be very wary of attributing any change in scores to any policy or practice.  Nonetheless, NAEP is useful for raising questions and generating hypotheses.  I’d like to hear the hypotheses that supporters of Porfolio Management have to offer that might account for the precipitous drop in NAEP scores in Louisiana’s charter sector over the last several years.


A Modest Counter-Proposal: Eliminate Think Tanks, Not College Sports

April 12, 2018

Image result for a modest proposal

The Urban Institute has a piece that explores how many low-income students could be offered college scholarships if college athletics were eliminated.  As it turns out, most college sports programs spend more than they receive in revenue (although this excludes other possible sources of revenue, such as donations and additional enrollment that may be related to support for the sports program).  Erica Blom, who is a research fellow at the Urban Institute, calculates that if you remove the amount of money devoted to athletic scholarships, sports programs at the top 230 institutions lose about $798 million per year.  At $4,000 per scholarship, she estimates that eliminating college sports could add another 199,400 scholarships that universities could offer to low income students.  Blom acknowledges that sports may produce some benefits, but she concludes that “many of these benefits, however, can arguably be gained through participation in intramural sports.”

Let’s leave aside whether she is accurately calculating the net cost of these sports programs.  And let’s leave aside her general conclusion that these programs produce little or no benefit in exchange for their cost.  And let’s further leave aside her implicit assumption that there is a superior benefit to be realized from spending the money instead on nearly 200,000 partial scholarships for students who may have a low probability of completing degrees for which they would be induced to take on substantial debt even with their partial scholarships.  What Blom’s argument boils down to is that she doesn’t like college sports and would rather that the money be spent on something else.  She’s not alone in this view.  Many education analysts have a distaste for college sports and would prefer those resources be diverted to other activities.

Of course, we could all pick different aspects of higher education that we think are of dubious value, and, without having to prove it, argue that money devoted to those functions should be spent on something that we think is better (without having to prove that it is really better).  I might say that there are entire degree programs, often ending in the word “studies,” that confer little or no benefit (and maybe even harm) on students and yet in aggregate cost hundreds of millions of dollars.  Why don’t we do away with those to pay for more scholarships?  The only reason people can so easily suggest abolishing college sports, as opposed to some other university effort, is that there is — quite literally — a prejudice against college athletics among education expert-types.

Most people outside of the edu-policy sphere, however, place a very high value on college sports.  This doesn’t just include the millions of fans who spend billions of dollars on watching sporting events and buying merchandise, but also the thousand upon thousands of college students who participate in college athletics each year.  They clearly think these activities have value, so why should we substitute Blom’s preference for theirs?  It’s not as if the handful of largely ambiguous social science studies to which Blom links prove that these millions of people are suffering from false consciousness in finding benefits in college sports.

To illustrate how specious it is to favor abolishing college sports to pay for scholarships, let me offer a counter-proposal: let’s abolish think tanks.  According to a 2013 report, a group of 21 think tanks generated $1.076 billion in annual revenue.  That’s just 21 of them.  There are scores more.  The social benefit of the funds devoted to these think tanks is of highly dubious value.  Like college sports, think tanks have been marred with corruption and sex scandals. And think tanks facilitate a large amount of PLDD where people sit in their offices imagining how they would run the world better, doing things like taxing snacks, building light rail, or abolishing college athletics.  Wouldn’t the world be better off if we eliminated think tanks and used those funds to pay for college scholarships instead?

See?  Isn’t this fun?  Let’s all imagine things we don’t like and fantasize how things would be so much better if only those resources were devoted to things we did like.  We could get jobs in think tanks to do it.


Derek Jeter for “The Higgy”

April 5, 2018

Image result for marlins "The Great Sea Race"

It was bad enough that Marlins fans had to suffer under previous team owner Jeff Loria’s inept management of the team (after impressively winning the World Series twice in the first ten years of the team’s existence).  And it’s even worse that a new ownership group led by former Yankee star, Derek Jeter, has dismantled what may have been the best outfield in baseball to conserve cash since the new owners seem financially exhausted immediately after having made the purchase. But the action that has significantly detracted from the human condition and made Jeter worthy of a nomination for the William Higinbotham Inhumanitarian Award is his decision to cancel The Great Sea Race.

Since 2012 the Marlins have held a race among mascots dressed in sea creature outfits during the 6th inning of home games.  This contest between Bob the Shark, Julio the Octopus, Angel the Stone Crab, and Spike the Sea Dragon, however, has come to a halt under the new ownership led by Jeter.  The exact reasons have not been given, but it appears that the new ownership group seems to view baseball as a serious enterprise, deserving of reverence.  As one commentator put it: “[Jeter] doesn’t want fans to have anything to smile about this year.”

The late, great Bill Veeck understood what baseball was really about — fun.  As owner of the Cleveland Indians, St. Louis Browns, and Chicago White Sox he packed the games with bread and circuses to make the experience entertaining.  He hired Max Patkin, the “Clown Price of Baseball,” to coach the Indians.  Patkin “had a face seemingly made of rubber that could make many shapes. He was rail thin and wore a baggy uniform with a question mark (?) on the back in place of a number, and a ballcap that was always askew. While some derided his act as corny, he became a beloved figure in baseball circles…”

In St. Louis, “some of Veeck’s most memorable publicity stunts occurred during his tenure with the Browns, including the appearance on August 19, 1951, by Eddie Gaedel, who stood 3 feet 7 inches tall and is the shortest person to appear in a Major League Baseball game. Veeck sent Gaedel to pinch hit in the bottom of the first of the game. Wearing elf like shoes and ‘1/8’ as his uniform number, Gaedel was walked on four straight pitches and then was pulled for a pinch runner.  Shortly afterwards ‘Grandstand Manager’s Day’ – involving Veeck, Connie Mack, and thousands of regular fans, enabled the crowd to vote on various in-game strategic decisions by holding up placards: the Browns won, 5–3, snapping a four-game losing streak.”

And in Chicago, Veeck was the first to introduce an electronic scoreboard that lit up, made noises, and shot fireworks when the White Sox hit a home run.  And in one of my greatest childhood memories, Veeck organized Disco Demolition Night, in which local DJ, Steve Dahl, dressed in a military uniform, drove a Jeep into center field, and blew up a pile of disco records during the middle of a double-header.  Dahl got the crowd so riled up that they stormed the field, causing enough damage that the White Sox had to forfeit the second game.  It was hilarious!  Veeck was also the one to convince Harry Caray, who was the announcer for the White Sox at that time, to sing “Take Me Out to the Ballgame” to the crowd during the 7th inning stretch.  Caray tried to refuse but Veeck said that he had a recording of Caray singing and would play it over the audio if Caray refused to perform it live.  Caray continued this practice of singing to the crowd when he moved to the Northside to become the announcer for the Cubs.  Thus was a much-loved baseball tradition born.

Veeck’s teams also played very well.  Under Veeck’s leadership, the Indians won their first pennant in 20 years in 1948.  In 1959, Veeck’s White Sox won their first pennant in 40 years. But Veeck understood that most teams fall short most years, so they have to offer their fans something other than victory to keep everyone entertained.  Yes, they should offer quality play, but sometimes even that is beyond the reach of most teams.  As we Marlins fans (can I use the plural for that?) wait for the team to rebuild with only moderate chances of being successful for many years, can’t we at least watch some frickin’ fish run around the outfield?

Jeter seems to represent the type of baseball fan who has watched Field of Dreams a few too many times.  The game is not a sacred ritual, deserving of church-like propriety.  It’s entertainment. It should be fun. For trying to make us all eat our baseball vegetables and taking away The Great Sea Race, Derek Jeter is worthy of The Higgy.

 


Museums and Theaters Should Stop Telling Me What to Think About Art

April 3, 2018

Image result for “Perhaps only silence and love do justice to a great work of art”

Museums and theaters should stop telling me what to think about art.  I know that the folks who run museums and theaters think they are just providing context and facilitating discussion, but too often they are actually attempting to control what their patrons think about art works and plays with excessive gallery text and after show “talkbacks.”

I have no expertise in curating galleries or presenting plays, but I can speak as a frequent consumer of the arts that this well-intentioned, but ultimately bossy, deluge of information interferes with my direct experience and enjoyment of the art.  And I’m not the only one who feels this way.  Last year the playwright, David Mamet, forbid talkbacks following his plays.  He was mocked by some in the theater community for this, but I understand what drove his action.  Too many theaters were hosting talkbacks after his plays in which the theater staff or an expert they selected were obviously steering the audience toward particular and simplified interpretations of his work that might make it less controversial.  As another playwright, Christopher Shinn put it: “Broadly speaking, theaters use talkbacks to protect the audience from uncomfortable feelings the play may have aroused.”

I’m all for discussing plays after you see them, but that’s why you should go for dinner or drinks afterwards. When theaters host the discussion they cannot help but use their authority to drive the discussion in certain directions.  I don’t want my theater to tell me what to think about the play I just saw.  I want to develop my own thoughts and talk to others without the mediation of self-appointed experts on its meaning.

Shaping what people think about a play is especially likely if the theater-facilitated discussion immediately follows the performance.  That’s why some theaters hold events at separate time or in separate locations, more clearly demarcating the interpretation from the performance itself.  This seems like a reasonable compromise for theaters concerned about expanding audience engagement without being too controlling.

Some art museums have also drifted away from excessive gallery text.  The Isabella Stewart Gardner museum in Boston, for example emphasizes: “Isabella created her installations to evoke an emotional response in visitors. That’s why, unlike at other institutions, there aren’t conventional labels in this museum. She wanted you to find your own meanings.”  But other museums, while trying to avoid “the priestly voice of absolute authority,” still feel obliged to cover their walls in verbiage about the social and historical context of the works they display.

I sympathize with the impulses of museum staff to try to help their patrons, but I fear that they have too little trust in the ability of the art to communicate without mediation.  In addition, social and historical contexts are complicated and often disputed, so when museums try to convey that context they are inevitably making choices about what the correct understanding of history and sociology should be.  I am no more interested in having my museum tell me what to think about the world than having it tell me what to think about art.


Also posted on the University of Arkansas’ NEA Research Lab Blog.

 


The Pre-Spinning of NAEP Results

April 2, 2018

Image result for spinning

NAEP results are being released next week, but state departments of education have already been briefed on their results.  State education officials are leaking like sieves, so many edu-pundits have at least some inkling of what’s coming.  Rumors from multiple sources suggest that the results generally look bad — with a decline nationally.  Aware that they may be blamed for declines, a number of folks are anticipating the release by placing their own spin on the soon-to-be-released results.

Exhibit A in this pre-spinning is John White, who is the superintendent of education in Lousiana.  According to Matt Barnum at Chalkbeat White sent a letter to the U.S. Department of Education on March 23 raising concerns about the comparability of NAEP results over time given the transition to computer-administered testing.  Although he acknowledges that NCES made adjustments to ensure the comparability of paper and pencil and computer administered tests and found insignificant differential effects by sub-group, White still raises questions about whether differential effects may distort results for certain states.  Barnum notes: “Even though researchers warn that it is inappropriate to judge specific policies by raw NAEP results, if White’s letter is a signal that Louisiana’s scores have fallen, that could deal a blow to his controversial tenure, where he’s pushed for vouchers and charter schools, the Common Core, letter grades for schools, and an overhaul of curriculum.  White said his state’s results are not what’s driving his concerns.”  Hmmm.  Maybe it’s just a remarkable coincidence that White has suddenly developed these technical concerns about the validity of NAEP at about the same time that he was briefed on his state’s results.  How much do you want to bet that there is a decline in LA?

Exhibit B is Arne Duncan taking to the pages of the Washington Post to defend the idea that ed reform has contributed to significant improvement.  He focuses on trends over the last few decades.  That would be a smart move to focus on long-term gains if recent trends — you know, in the wake of Duncan’s tenure as U.S. Secretary of Education — have been taking a nose-dive.  NAEP results slipped for the first time when 2015 results were released.  How much do you want to bet that national results have declined again?

Barnum is right to warn people away from inferring too much from changes in NAEP as an indicator of the success of any particular policy or education leader, but these folks live in a political, not a research, world.  Both White and Duncan’s political standing in education policy was built on over-claiming from NAEP results, and those who live by the NAEP sword may die by it.  That’s why they better start spinning.


Did Changed Test Questions Cause National Decline in Smarter Balanced Scores?

March 26, 2018

(Guest Post by By Douglas J. McRae and Williamson M. Evers)

Did Smarter Balanced mishandle the bank of test-questions for 2017? Test scores dropped in virtually all states using Smarter Balanced national tests in their statewide testing programs in 2017.

States that used the Common Core-aligned Smarter Balanced tests showed English/Language Arts and Mathematics composite declines for 11 of the 14 states using these tests spring 2017, neither a loss nor a gain for 2 states, and a very modest gain for only a single state. Looking only at E/LA scores, there are declines for 13 states and only a tiny fraction of one percent gain for one state.

Tony Alpert, executive director for the Smarter Balanced Assessment Consortium, argues these results show that scores from its states are on a plateau. No, instead, there has been a substantial consortium-wide decline in scores.

We can compare the 2017 declines to consortium-wide composite score gains for 13 out of 14 Smarter Balanced states in 2016, and to composite gains for the parallel PARCC consortium for 2017 for all except one state. Both of these comparisons make the 2017 Smarter Balanced declines look like a sore thumb pointed downwards.

Yet Smarter Balanced continues to stonewall against releasing actual evidence or independent analysis of data contributing to the 2017 declines in test scores.

Alpert’s Jan. 26 opinion piece acknowledged for the first time that the 2017 item bank was changed from the 2016 item bank in a significant way. All information released by Smarter Balanced prior to Jan 26 indicated that the 2016 item bank was unchanged from 2015, and there was no public notice that the 2017 item bank was in fact modified from the 2016 item bank.  This lack of transparency from Smarter Balanced adds to concerns that the 2017 declines may be traced to changes for the 2017 test-question bank.

Alpert says that the item bank was “similar between the two years.” Well, “similar” isn’t good enough for valid, reliable gain-scores from year-to-year or trend data over multiple years — certainly one of the major goals for any K-12 large-scale statewide testing program. We need more evidence than an assertion of similarity.

To generate valid reliable gain scores from year-to-year, a test maker has to document that changes made for any item bank do not change alignment to the academic content standards that are being measured, as well as not changing the coverage of what is in the blueprints for the test. In addition, the balance of easy-medium-hard test questions has to match the prior item bank, or scoring adjustments need to be made to reflect changes. This information has to be available before a modified item bank can be used for actual test administrations.

A glimpse into this information surfaced on Feb 9 in a document linked to an Education Week post on this issue. This link was to Smarter Balanced subcontractor (AIR) technical report dated Oct 2016 (but not made public by Smarter Balanced until recently). It includes an appendix on changes for item characteristics for Smarter Balanced operational item banks for 2016 and 2017; these charts showed the addition of more “easy” items for E/LA and Math for grades 3 and 4, and addition of more “hard” items for E/LA and Math for grades 5, 6, 7, and 8. This mix of additional items for the 2017 testing cycle indicates the 2017 item bank had more difficult items than the 2016 item bank, which unless Smarter Balanced adjusted their scoring specifications for the 2017 test, would be consistent with the decline in scores from 2016 to 2017 documented in late September 2017. Smarter Balanced has not released information to date on whether the scoring specifications were adjusted for differences in difficulty of the tests on a grade-by-grade basis for both E/LA and Math for the spring 2017 test administration cycle.

In addition, a test maker should monitor the item-by-item data for a revised item bank early during an actual test administration cycle to insure the new items added to the bank (for either replacing former items or expanding the size of the bank) are performing as anticipated, in order to further modify scoring rules as needed to ensure comparability of results from year-to-year. According to the Feb. 9 Education Week post, Smarter Balanced said they were now doing these analyses, long after-the-fact.

The Smarter Balanced lack of transparency for critical information on this issue is quite troubling. So far, Smarter Balanced has released no information confirming these routine test integrity activities were done prior to scoring and releasing spring 2017 test results for 5-6 million students from 14 states.  Smarter Balanced has the behind-closed-doors data from the 2017 testing cycle; those data are required for informing any changes for the 2018 testing cycle which is already underway. If the 2017 data does not inform changes for the 2018 testing cycle, lack of routine comparability activities will affect Smarter Balanced annual gain scores and trend data for years into the future.

If Smarter Balanced has the evidence outlined above to justify their claim “we have every reason to believe that the scores accurately describe what students knew and were able to do in spring 2017,” then the professional thing to do was to make that information available concurrent with the release of 2017 test results in the fall of 2017, with perhaps more comprehensive analyses released before the beginning of the next test administration cycle.

Such transparency would have informed students, teachers, parents, school & state administrators, state policy makers, the media, and the public of the integrity of previously released test results and would have offered justification for any changes for the upcoming testing cycle. More transparency from Smarter Balanced would also allow independent experts to review and validate their currently behind-closed-doors data. What is Smarter Balanced hiding? Why isn’t it transparent like any other professional testing organization?

We repeat our call for Smarter Balanced to open the wall of secrecy for the information needed to investigate their 2017 consortium-wide score decline problem, and allow their claims to be examined by independent experts. The Smarter Balanced January 29 opinion piece falls woefully short of providing evidence their 2017 tests provided comparable scores from year-to-year upon which to conduct gain and trend analyses. We deserve better analysis and explanations from Smarter Balanced, along with much greater transparency for all parties interested in statewide K-12 test results across the country.


Douglas J. McRae is a retired educational-measurement specialist from Monterey, California.  Williamson M. Evers is a research fellow at Stanford University’s Hoover Institution and a former U. S. Assistant Secretary of Education for Planning, Evaluation and Policy Development.