Shanker Institute Scholar Bounded in a Nutshell but Counts Himself a King of Infinite Space

January 15, 2013

(Guest Post by Matthew Ladner)

Matthew DiCarlo of the Shanker Institute has taken to reviewing the statistical evidence on the Florida K-12 reforms. DiCarlo reaches the conclusion that we ultimately can’t draw much in the way of conclusions regarding aggregate movement of scores.  He’s rather emphatic on the point:

In the meantime, regardless of one’s opinion on whether the “Florida formula” is a success and/or should be exported to other states, the assertion that the reforms are responsible for the state’s increases in NAEP scores and FCAT proficiency rates during the late 1990s and 2000s not only violates basic principles of policy analysis, but it is also, at best, implausible. The reforms’ estimated effects, if any, tend to be quite small, and most of them are, by design, targeted at subgroups (e.g., the “lowest-performing” students and schools). Thus, even large impacts are no guarantee to show up at the aggregate statewide level (see the papers and reviews in the first footnote for more discussion).

DiCarlo obviously has formal training in the statistical dark arts, and the vast majority of academics involved in policy analysis would probably agree with his point of view. What he lacks however is an appreciation of the limitations of social science.

Social scientists are quite rightly obsessed with issues of causality. Statistical training quickly reveals to the student that people are constantly making ad-hoc theories about some X resulting in some Y without much proof. Life abounds with half-baked models of reality and incomplete understandings of phenomena, which have a consistent and nasty habit of proving quite complex.

Social scientists have developed powerful statistical methods to attempt to establish causality techniques like random assignment and regression discontinuity can illuminate issues of causality. These types of studies can bring great value, but it is important to understand their limitations.

DiCarlo for instance reviews the literature on the impact of school choice in Florida. Random assignment school choice studies have consistently found modest but statistically significant test score gains for participating students. Some react to these studies with a bored “meh.” DiCarlo helps himself along in reaching this conclusion by citing some non-random assignment studies. More problematically, he fails to understand the limitations of even the best studies.

For example, even the very best random assignment school choice studies fall apart after a few short years. Students don’t live in social science laboratories but rather in the real world. Random lotteries can divide students into nearly identical groups with the main difference being that one group applied for but did not get to attend a charter or private school. They cannot however stop students in the control group from moving around.

Despite the best efforts of researchers, attrition immediately begins to degrade control groups in random assignment studies. Usually after three years, they are spent. Those seeking a definitive answer on the long-term impact of school choice on student test scores are in for disappointment. Social science has very real limits, and in this case, is only suggestive. Choice students tend to make small but cumulative gains year by year which tend to become statistically significant around year three, which is right around when the random assignment design falls apart. What’s the long-term impact? I’d like to know too, but it is beyond the power of social science to tell us, leading us to look for evidence from persistence rates.

So let’s get back to DiCarlo, who wrote “The reforms’ estimated effects, if any, tend to be quite small, and most of them are, by design, targeted at subgroups (e.g., the “lowest-performing” students and schools). Thus, even large impacts are no guarantee to show up at the aggregate statewide level.”  This is true but fails to recognize the poverty of the social science approach itself.

DiCarlo mentions that “even large impacts are no guarantee to show up at the aggregate statewide level.” This is a reference to the “ecological fallacy” which teaches us to employ extreme caution when travelling between the level of individual and aggregate level data. Read the above link if you want to know all the brutally geeky reasons why this is the case, take my word for it if you don’t.

DiCarlo is correct that connecting the individual level data (e.g. the studies he cites) to aggregate level gains is a dicey business. He however fails to appreciate the limitations of the studies he cites and the fact that the ecological fallacy problem cuts both ways. In other words, while generally positive, we simply don’t know the relationship between individual policies and aggregate gains.

We know for instance that we have a positive study on alternative certification and student learning gains. We do not and essentially cannot know however how many if any NAEP point gains resulted from this policy. The proper reaction for a practical person interested in larger student learning gains should be summarized as “who cares?” The evidence we have indicates that the students who had alternatively certified teacher made larger learning gains. Given the lack of any positive evidence associated with teacher certification, that’s going to be enough for most fair minded people.


The individual impact of particular policies on gains in Florida is not clear. What is crystal clear however is the fact that there were aggregate level gains in Florida. You don’t require a random assignment study or a regression equation, for instance when considering the percentage of FCAT 1 reading scores (aka illiterate) above. When you see the percentage of African American students scoring at the lowest of five achievement levels drop from 41% to 26% on a test with consistent standards, it is little wonder why policymakers around the country have emulated the policy, despite DiCarlo’s skepticism.

I could go on and bomb you with charts showing improving graduation rates, NAEP scores, Advance Placement passing rates, etc. but I’ll spare you. The point is that there are very clear signs of aggregate level improvement in Florida, and also a large number of studies at the individual level showing positive results from individual policies.

The individual level results do not “prove” that the reforms caused the aggregate level gains. DiCarlo’s problem is that they also certainly do not prove that they didn’t. It has therefore been necessary from the beginning to examine other possible explanations for the aggregate gains. The problem here for skeptics is that the evidence weighs very much against them: Florida’s K-12 population became both demographically and economically more challenging since the advent of reform, spending increases were the lowest in the country since the early 1990s (see Figure 4) and other policies favored by skeptics come into play long after the improvement in scores began.

The problem for Florida reform skeptics, in short, is that there simply isn’t any other plausible explanation for Florida’s gains outside of the reforms. They flailed around with an unsophisticated story about 3rd grade retention and NAEP, unable and unwilling to attempt to explain the 3rd grade improvement shown above among other problems. One of NEPC’s crew once theorized that Harry Potter books may have caused Florida’s academic gains at a public forum. DiCarlo has moved on to trying to split hairs with a literature review.

With large aggregate gains and plenty of positive research, the reasonable course is not to avoid doing any of the Florida reforms, but rather to do all of them. In the immortal words of Freud, sometimes a cigar really is just a cigar.

The Dark Days of Educational Measurement in the Sunshine State Ended in 1999

February 8, 2012

(Guest Post by Matthew Ladner)

Over on the Shanker Blog of the American Federation for Teachers, Matthew DiCarlo writes a thoughtful but ultimately misguided post A Dark Day for Education Measurement in the Sunshine State.

DiCarlo is obviously very bright, but a few critical misinterpretations have led him astray. DiCarlo demonstrates that family income is highly correlated with student test scores in Florida. No surprise- the same is true everywhere.

Having demonstrated this, DiCarlo develops a critique of Florida’s school grading system. The Florida school grading system carefully balances overall performance on state exams with academic growth over time. Specifically, the formula weights student proficiency on state exams as 50% of a schools grade, 25% on the growth of all students, and the final 25% on the growth of students who scored in the bottom quartile on last year’s exam.

The last bit is the clever part of the formula. By double weighting the gains of students who are behind, they become the most important children in the building. Only the bottom quartile from last year’s test count in all three categories.

DiCarlo goes into the devilish details about how the state determines these gains, and concludes that some of the gains measures don’t actually measure academic growth but actually effectively measure academic proficiency. The use of proficiency levels in determining gains is critical because students are taking a higher grade level assessment with more rigorous content.  If a student achieves a proficient score on the eighth grade FCAT and then again on the ninth grade FCAT, the student is performing at a higher level because the content is more difficult.  Florida’s system does not provide credit for a learning gain for students performing Advanced in one year but Proficient the next year.

DiCarlo has failed to appreciate that the mastery of more challenging academic material from one grade to the next itself constitutes a form of academic growth.

The 9th grade student has now studied the mathematics curriculum of both 8th and 9th grade and has demonstrated  proficiency of the 8th grade material and  proficiency of the 9th grade material. Given the valid system of testing, we can feel assured that the 9th grader knows more about math than he or she knew as an 8th grader. The growth in this case is staying on track in a progressively more challenging sequential curriculum.

The Florida system, in essence, makes use of proficiency levels in order to give definition to gains and drops as meaningful. There of course is no “correct” way to structure such a system, and if 100 different people examined any given system they would likely have 500 different suggestions for improvement to match their preferences.

DiCarlo’s notion of “fairness” seems to have distracted him from a far larger and more important issue: the utility of the Florida grading system, seen best at the school grading level, has improved student achievement for all students.

If you go back as far as the FCAT data system will take you for results by Free and Reduced lunch eligibility for 3rd grade reading, you’ll find that in 2002 48% of Florida’s free and reduced lunch students scored FCAT 3 or better. In the most recent data available from 2010, 64% scored FCAT 3 or better. That is an enormous improvement in the percentage of students scoring at grade level or better.

In 2002, 60% of all Florida students scored Level 3 or above, and in 2010, 72% scored Level 3 or above. Free and reduced lunch eligible kids in 2010 outperformed ALL kids in 2002 by 4 percentage points. That’s real progress.  And the free and reduced lunch eligible children overtake the 2002 general population averages in a large majority of grades tested.

The same pattern can be found in Florida’s NAEP data. For instance, in 1998, 48% of Florida’s free and reduced lunch eligible students scored “Below Basic” on the NAEP 8th grade reading test. In 2011, that number had fallen to 35%. If an “unfair” system helps to produce a 27% decline in the illiteracy rate among low-income students, I’d like to order up a grave injustice.

The “Dark Days of Education Measurement in Florida” in my view were before school grades. Academic failure lied concealed behind a fog of fuzzy labels, and Florida wallowed near the bottom of the NAEP exams. Back when there was little transparency and even less accountability, far more students failed to acquire the basic academic skills needed to succeed in life. While perhaps a lost golden age for educators and administrators wishing to avoid any responsibility for academic outcomes, it was a Dark Age for students, parents and taxpayers.

Ironically, DiCarlo has decried a system which has weakened the link between family income and academic outcomes demonstrated in his post. Yes it is still strong in Florida, but it used to be much, much stronger.

Finally, can one truly complain about the “fairness” of a system providing more than ten times as many A/B grades as D/F grades? If anything, the Florida school grading system has grown too soft in my view (see chart above).

I’ve read enough of DiCarlo’s work to know that he is a thoughtful person, so I hope he will examine the evidence for himself and reconsider his stance. I don’t have any reason to think that the Florida system is perfect. I don’t think a perfect system exists, and I suspect that there are some changes to the Florida system that DiCarlo and I might actually agree on.

It seems however difficult to argue that the Florida system hasn’t been useful if one gives appropriate weight to the interests of students, parents and taxpayers to balance those of school staff.